I wanted a good tool for named entity extraction on tweets. I came across this : https://github.com/aritter/twitter_nlp.
Here is how I installed it :
2. unzip master
3. cd master
4. yum install glibc-static
5. sh build.sh
To run it, go to the directory containing python folder.
1. export TWITTER_NLP=./
2. cat test.1k.txt | python python/ner/extractEntities2.py
I made another sample file, test2, with these 2 tweets
I live in Jodhpur.
usgs reports a m0.46 #earthquake 13km nw of jodhpur, rajasthan on 5/1/15 @ 15:50:46 utc http://t.co/mqgsvgnkbo #quake
and then :
cat test2 | python python/ner/extractEntities2.py --classify --pos --event
Results:
I/O/PRP/O live/O/VBP/B-EVENT in/O/IN/O Jodhpur/B-person/NNP/O ./O/./O
usgs/O/NNP/O reports/O/VBZ/B-EVENT a/O/DT/O m/O/NN/O 0.46/O/HT/O #earthquake/O/HT/O 13km/O/HT/O nw/O/NN/O of/O/IN/O jodhpur/O/NN/O ,/O/,/O rajasthan/O/VBN/O on/O/IN/O 5/1/15/O/CD/O @/O/IN/O 15:50/O/CD/O :/O/:/O 46/O/CD/O utc/O/:/O http://t.co/mqgsvgnkbo/O/URL/O #quake/O/HT/O
So Jodhpur is classified as a person though it should be a location.
I don't know what am I doing wrong here?
No comments:
Post a Comment