[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Memex Oil Gush
- To: [email protected]
- Subject: Memex Oil Gush
- From: [email protected] (stef)
- Date: Mon, 20 Apr 2015 16:46:23 +0200
- In-reply-to: <CAD2Ti28PORvP-Ow6Yj92BXX7a_TUSTpRCzgVQbxXzJNUB=9ABg@mail.gmail.com>
- References: <[email protected]> <[email protected]> <CAJVRA1SRpJmQiegg9Q8=zODycFt+V74pOKkZtZzg5xkC_YTJNw@mail.gmail.com> <CAD2Ti28PORvP-Ow6Yj92BXX7a_TUSTpRCzgVQbxXzJNUB=9ABg@mail.gmail.com>
On Mon, Apr 20, 2015 at 10:20:28AM -0400, grarpamp wrote:
> Some memex bits now open sourced...
>
> http://www.forbes.com/sites/thomasbrewster/2015/04/17/darpa-nasa-and-partners-show-off-memex/
> TJBatchExtractor is whatâ??s going open source today. It allows a user to
> extract data, such as a name, organisation or location, from advertisements.
this sounds interesting, there was open-calais so far from reuters which did
this, but only as a centralized service, if gratis, or you could build your
own corpuses if your domain is not covered by the widely available ones.
however there is lot's of problems with non-english names, for evaluation of
such entity-extractors i recommend to test them with some data set containing
eu public officials, with names in greek, bulgarian and some latin-speaking
country and some slavic speaking one and you have something that can confuse
such entity extraction quite sufficiently. i guess i'm gonna give this a test,
maybe it's better. but i guess this again also mostly depends on the corpus.
--
otr fp: https://www.ctrlc.hu/~stef/otr.txt