Skip to content
forked from ufal/hamledt

Makefiles, scenarios and support scripts for the development of HamleDT within the Treex infrastructure

License

Notifications You must be signed in to change notification settings

Stormur/hamledt

This branch is 5 commits ahead of, 457 commits behind ufal/hamledt:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

595e9c6 · Jul 16, 2018
Aug 29, 2015
Jun 11, 2012
Aug 19, 2015
Nov 17, 2017
Jul 16, 2018
Nov 17, 2017
Nov 20, 2017
Nov 20, 2014
Nov 27, 2017
Mar 17, 2014
Jul 1, 2017
Nov 16, 2011
Jul 14, 2015
Jun 7, 2016
Apr 24, 2015
Aug 18, 2015
Mar 7, 2015
Jul 21, 2016
Jun 11, 2012
Jul 21, 2016
Jul 21, 2016
Sep 23, 2011
Aug 18, 2015
May 10, 2014
Nov 15, 2015
May 14, 2016
Mar 14, 2017
Nov 15, 2017
Sep 19, 2012
Oct 24, 2013
May 8, 2014
May 7, 2014
May 25, 2014
May 25, 2014
Jul 21, 2016
Jul 21, 2016
Aug 13, 2015
Feb 28, 2017
Mar 14, 2012
Nov 19, 2011
Apr 23, 2015

Repository files navigation

HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of
existing dependency treebanks (or dependency conversions of other treebanks),
transformed so that they all conform to the same annotation style. For more
information please see the project website at

http://ufal.mff.cuni.cz/hamledt

This repository contains makefiles and support scripts needed for HamleDT
development. You also need Treex and Interset, which are in separate
repositories. In particular, the tree transformation and harmonization code
is part of Treex (implemented as Treex blocks), see the ufal/treex Github
repository.



History:

These files were originally stored in the TectoMT Subversion repository
(https://svn.ms.mff.cuni.cz/svn/tectomt_devel/trunk/treex/devel/hamledt).
Some important points in time:

r5974  (2011-06-27 zabokrtsky) ... created treex/devel/normalize_treebanks
r7684  (2011-12-31) .............. HamleDT 0.9 or 1.0 approximate date (not fixed and archived)
r8819  (2012-06-11 popel) ........ normalize_treebanks renamed to hamledt
r11004 (2013-08-28 rosa) ......... hamledt copied to hamledt2
r11606 (2014-02-15 zeman) ........ HamleDT release 1.5 (Prague, article in LRE)
r11870 (2014-03-14 zeman) ........ removed old hamledt (after checking all languages for HamleDT release 2.0)
r11991 (2014-03-23 zeman) ........ hamledt2 renamed to hamledt
r12700 (2014-05-24 zeman) ........ HamleDT release 2.0 (Prague + Stanford)
r14841 (2015-04-23 zeman) ........ pruned large generatable files, hamledt with history copied to Github ufal/hamledt
r14847 (2015-04-24 zeman) ........ hamledt removed from the Subversion repository
commit 19f47665fed00b9defe5119b557ca950384db0ba (2015-08-18 zeman) ..... HamleDT release 3.0 (UD)

See also

https://svn.ms.mff.cuni.cz/trac/tectomt_devel/ (password-protected access, only for ÚFAL members)
https://github.com/ufal/hamledt



Notes on migration to Github:

Created a users.txt file following the instructions in
http://git-scm.com/book/es/v2/Git-and-Other-Systems-Migrating-to-Git

git svn clone https://svn.ms.mff.cuni.cz/svn/tectomt_devel --authors-file=users.txt --no-metadata --trunk=trunk/treex/devel/hamledt --prefix=svn/

Tag statistics and similar files that were comparably large and that could be
generated again if necessary were removed from the repository. The history was
then pruned using the BFG repo-cleaner (https://rtyley.github.io/bfg-repo-cleaner/),
with the blob size limit set to 400K. Subsequently the git garbage collection
was invoked as recommended in the BFG documentation:

java -jar bfg-1.12.3.jar --private -b 400K hamledt
cd hamledt
git reflog expire --expire=now --all && git gc --prune=now --aggressive

git remote add origin https://github.com/ufal/hamledt.git
git push -u origin master

About

Makefiles, scenarios and support scripts for the development of HamleDT within the Treex infrastructure

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • HTML 37.6%
  • Perl 29.2%
  • Makefile 27.0%
  • PHP 4.7%
  • Shell 1.0%
  • CSS 0.3%
  • Other 0.2%