SOLR commons issues

By W.Zh   May 2015

issue:

Exception in thread “main” java.lang.UnsupportedClassVersionError: org/apache/solr/util/SolrCLI : Unsupported major.minor version 51.0

When:

After unzip the solr, you will try to start using:

run:

bin/solr start -e cloud -noprompt

 

Reason: 

JAVA version is less than 7

How to:
please look at my page to
How to upgrade the JAVA version at ubuntu

Advertisements

How to caculate two article’s similarity or distance?

By W.ZH

May 2015

1. Use the Topic Modelling. reflect the text content of the articles to the dimensions of the topic, then you can try to calculate the similarity.

Tools:
gensim by Google in Python

link:
http://www.52nlp.cn/%E5

2. Use the Cosine Similarity, Steps like this:

(1)use the TF-IDF to find out the key words of tow articles

(2)combine the the two key words set into one set, and get the frequency of the each keys.

(3)create the frequency vectors for two articles.

(4)caculate the Cosine Similarity of each vector, then the bigger , the similar they two.

Reference:
TF-IDF (term frequency–inverse document frequency)

http://www.ruanyifeng.com/blog/2013/03/cosine_similarity.html

http://en.wikipedia.org/wiki/Tf%E2%80%93idf

http://www.ruanyifeng.com/blog/2013/03/tf-idf.html