NLU tools – Apache Tika

https://tika.apache.org/

Apache Tika – This is a must have tool if you doing the Natural Language Understanding related work in Java. As you have to prepare your training materials  with many text and articles. Tika is a tool to help you extract the text from all kinds of the docs such as  html, PPT, word and other office doc types, and many many others.

“Tika detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). ”

Add these dependency to your maven:

<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-core</artifactId>
    <version>1.16</version>
  </dependency>

and you can use the core tika, such as check the doc file type etc.

If you want more to extract content, you also need to add parser and also some others upon needs.

 <dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-parsers</artifactId>
    <version>1.16</version>
  </dependency>

It also support running at Restful service mode with Jetty server. so you can call API through the web service. And it has simple GUI too.

 

 

Advertisements

How to call a c/c++ shared lib from java

You have two ways at here, JNI and JNA. JNA is based on JNI, but simpler at usage here. I will give an example for JNA usage today. It does not need use javah to create a header file.

  1. Add jna lib to you maven :
    <dependency>
    
     <groupId>net.java.dev.jna</groupId>
    
     <artifactId>jna</artifactId>
    
     <version>4.4.0</version>
    
     </dependency>
  2. Create a c file like this:
    cd /home/test
    
    touch MyTest.c
    /* MyTest.c */
    #include <stdio.h>
    
    void aSimplePrint() {
     printf("Hello world from C!\n");
    }
  3. compile the c file to the .so share lib file:
    gcc -c -fPIC MyTest.c -o MyTest.o
    gcc -shared -o libMyTest.so MyTest.o
  4. Create a java file like this, TestJNA.java
    import com.sun.jna.Library;
    import com.sun.jna.Native;
    
    public class TestJNA {
    
        static {
            System.setProperty("jna.library.path", "/home/test");
      }
    
        public interface MyTest extends Library {
            public void aSimplePrint();
         }
    
        public static void main(String[] args) {
            MyTest ctest = (MyTest) Native.loadLibrary("MyTest", MyTest.class);
            ctest.aSimplePrint();
         }
    }

    Compile to get the TestJNA.class.

  5. Then you can run the java main to test it:
    java -cp .:/path/to/jna-4.4.0.jar TestJNA

The whole process will be like this. And you could face some issue if lib is 3rd party one or others reason.
Here are some issues you may encounter in the process.

1. In linux, loadLibrary(“MyTest” will point to load a libMyTest.so file. You need pay attention name of lib at here.

2. “Exception in thread “main” java.lang.UnsatisfiedLinkError: no MyTest in java.library.path ……..”
This means java can not find the libMyTest.so in the lib folder.  This code is to solve that problem.
System.setProperty(“jna.library.path”, “/home/test”);
If you still have issue, you can try this code too:
System.load(“/home/test/libMyTest.so”);

3. If you can load lib, but can not find symbol or function name in the lib:
Exception in thread “main” java.lang.UnsatisfiedLinkError: Error looking up function ‘aSimplePrint’: /home/test/libMyTest.so: undefined symbol: …………………

Then use this command to check what a share lib .so contains in the API:
nm -D /home/test/libMyTest.so
By this way, you can know that (T) function is existing in the lib or not.

4. Command :
file /home/test/libMyTest.so
— will tell you info about this file include it is for 32bit or 64 bit will will cause some problem of some 3rd party lib files.

Multiple version JAVA in Ubuntu

Install JDK 8

sudo apt-get update ; sudo apt-get install openjdk-8-jre-headless
sudo  update-java-alternatives --list

to list off all the Java installations on a machine by name and directory, and then run

sudo  update-alternatives --config java

to choose which JRE/JDK to use.If you want to use different JDKs/JREs for each Java task, you can run can be configured (java, javac, javah, javaws, etc). And then

sudo  update-alternatives --config [javac|java|javadoc|etc.]

will associate that Java task/command to a particular JDK/JRE.

Solve the Json/Java polymorphism request by Jackson

By W.ZH

When convert between json and jave objects, you could often face the polymorphism issue.
Such as a json of Content could be:

{
  "type":"valuea",
  "value": {
    "valueaName": "corn",
    "bar":  "sweet"
  }
}

and also could be

{
  "type":"valueb",
  "value": {
    "valuebName": "toy",
    "color":  "yellow",
    "price":  "20"     
  }
}

Value objects in fact is depends on the type data to change. This requires that depending on the type data and create different value’s object at Java side, How to do?

Here is the solution at the Jackson code by using the     @JsonTypeInfo

let us define a abstract class Value and another two class ValueA and ValueB to Extends it.

public abstract class Value {
}

and ValueA to inherit Value

@JsonInclude(JsonInclude.Include.NON_NULL)
public class ValueA extends Value {

    @JsonProperty("valueaName")
    private String valueaName = "";

    @JsonProperty("bar")
    private String bar;
....................................
....................................
}

and ValueB Class to inherit Value.

@JsonInclude(JsonInclude.Include.NON_NULL)
public class ValueB extends Value {

    @JsonProperty("valuebName")
    private String valuebName = "";

    @JsonProperty("color")
    private String color;

    @JsonProperty("price")
    private String price;
....................................
....................................
}

 

Then after that we can create a Class for Content to let “type” files  work as a EXTERNAL_PROPERTY to control the Value object in side the Content:

@JsonInclude(JsonInclude.Include.NON_NULL)
public class Content {

    @JsonProperty("type")
    private String type;

    @JsonInclude(JsonInclude.Include.NON_NULL)
    @JsonProperty("value")
    private Value value = null;
    
    public Content (){
    }
    public String getType() {
        return type;
    }

    public void setType(String type) {
        this.type = type;
    }

    @JsonTypeInfo(use = JsonTypeInfo.Id.NAME, include = JsonTypeInfo.As.EXTERNAL_PROPERTY, property = "type")
    @JsonSubTypes({ @Type(value = ValueA.class, name = "valuea"),@Type(value = ValueB.class, name = "valueb")})
    public void setValue(Value value) {
        this.value = value;
    }

    public Value getValue() {
        return value;
    }

}

In this way. @JsonTypeInfo in fact define how to rely on “type” to dynamically  serialize and de-serialize the json/java object.

To make this properly work, your Jackson version must higher than the 2.5 version. As I have faced the duplicated fields bug in the 2.5  version Jackson , here is sample dependency:

 

        <!-- Jackson -->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.8.0</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.8.0</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-annotations</artifactId>
            <version>2.8.0</version>
        </dependency>

 

 

How to run Memcached on Mybatis

By W.ZH

1. Install Memcache

To start, install memcached via apt-get. such as in the Ubuntu 12.04

sudo apt-get install memcached

It auto starts the memcached

ps -ax | grep memcac
21199 ?        Sl     0:00 /usr/bin/memcached -m 64 -p 11211 -u memcache -l 127.0.0.1

Refer to official Mybatis link:  http://mybatis.github.io/memcached-cache/
Add the jar to your maven

<dependency>
    <groupId>org.mybatis.caches</groupId>
    <artifactId>mybatis-memcached</artifactId>
    <version>1.0.0</version>
  </dependency>
 and then add the memcached to your mapper if you want which MyBatis mapper to use it.
<mapper namespace="org.acme.FooMapper">
  <cache type="org.mybatis.caches.memcached.MemcachedCache" />
  ...
</mapper>
Create a memcached.properties file and put to your class path, eg resources folder.
# any string identifier
org.mybatis.caches.memcached.keyprefix=_mybatis_
# space separated list of ${host}:${port}
org.mybatis.caches.memcached.servers=127.0.0.1:11211
org.mybatis.caches.memcached.connectionfactory=net.spy.memcached.DefaultConnectionFactory
org.mybatis.caches.memcached.expiration = 600
org.mybatis.caches.memcached.asyncget = true
# the expiration time (in seconds)
org.mybatis.caches.memcached.timeout = 600
org.mybatis.caches.memcached.timeoutunit = java.util.concurrent.TimeUnit.SECONDS
# if true, objects will be GZIP compressed before putting them to Memcached
org.mybatis.caches.memcached.compression = false

In fact if you add multiple cache server at org.mybatis.caches.memcached.servers, it will has fail over ability among them. auto continue using live one if one die.

This intergration in fact based on the Spymemcached, is an asynchronous supported, single-threaded Memcached client. When you call any caching-related method on spymemcached’s MemcachedClient, it will be handled asynchronously. The client call method handles writing the details of the operation that should be performed into a queue and returning the control back to the client making the call. The actual interaction with the Memcached server, meanwhile, is handled by a separate thread that runs in the background.

My testing prove that it can improve the loading DB data at least 50% reading time if data has been in memcached. So this also proves that Mybatis self cache is not enough  big because it is not designed for cache only.

 

How to make EasyUI JS Datagrid to support filter and scrollview together

W.ZH

Issue: When you use the easyUI datagrid, you have millions of rows, you need to use pagination or scroll-view to support it.  But you might found that your filter will not work properly after you add scrollview or pagination feature.

You will find the filter may conflict with the pagination or table scroll feature.  Most common thing will see is filter sometimes work, but some times not. Problem occurs randomly on UI.

Reason: Most reason is because you want use the local filter but the datagrid data are gotten from remote/server side json. Then you will see this problem easily.

Solution: You need to run the filter and the pagination/scroll  both from server side or both from local. Here is some sample code I created to support the filter and scroll both together from remote server side.

Two more extension js files you need:

/resources/js/easyui/datagrid-filter.js">
/resources/js/easyui/datagrid-scrollview.js">

HTML for table:

<table id="data_list_table" style="height:750px"  ></table>

JS for load table

jQuery(document).ready(function() {
            $('#data_list_table').datagrid({
                url : '<%=path%>/web/sampleController/getSampleByPage',
                toolbar : '#data_list_table_toolbar',
                nowrap: true,
                fitColumns:true,
                singleSelect:true,
                rownumbers : true,
                view: scrollview,
                remoteFilter: true,
                pagination : false,
                autoRowHeight:false,
                pageSize : 30,
                columns:[[
                            {field:'sampleid',title:'DB ID',width:35},
                            {field:'samplename',title:'Slot Name',sortable :true, order : 'asc', width:110},
                            {field:'sampletype',title:'Slot Type',sortable :true, order : 'asc', width:95},
                            {field:'sampledesc',title:'Slot Description', width:250}
                        ]]
             });
               // filter for the table
             var dg = $('#data_list_table').datagrid().datagrid('enableFilter');
   });

So every time you load the table and also move mouse to scroll next page data, you will get the http request contains these data in the request:

sort
order
page
rows
filterRules – contains the filterRules data and you need to convert to your SQL where conditions.

here is a sample to process it in Spring and JAVA:

@RequestMapping(value = "/getSampleByPage", method = { RequestMethod.POST,
            RequestMethod.GET })
    @ResponseBody
    public EASYUIJSONTransport getSampleByPage(SearchInput search,
            HttpServletRequest request) throws Exception {
        int total = sampleService.totalCountSample();
        List<Sample> samples = searchbypage(search); // do by your self JPO access method

        return new EASYUIJSONTransport(total, slots);
    }
********************************************************
public class SearchInput {
    private String sort = "samplename";
    private String order = Common.sort_asc;
    private int page = Common.default_page;
    private int rows = Common.default_rows;
    private String filterRules = "";
.............................

}

Reference :

http://www.jeasyui.com/extension/datagrid_filter.php

http://www.jeasyui.com/tutorial/datagrid/datagrid27.php

http://www.jeasyui.com/extension/datagridview.php

Convert GoogleSites To WordPress

By W.ZH Apr 2016

Recently I setup this WordPress and tried to move my goolge sites articles to here. So here I  created a simple tool to read out the sites pages and convert them into the WordPress posts and keep the original dates of them.

I have shared my tools to GITHub at here:

https://github.com/WayneShare/GoogleSitesToWordPress

GoogleSitesToWordPress By Wayne.A tool to help convert your google sites pages to the WordPress posts.

  1. Export your google sites pages into a folder,use the tool of google-sites-liberation at this link https://github.com/sih4sing5hong5/google-sites-liberation by this tool, each page on sites will be created a index.html in a folder or subfolder.
  2. Find out your WordPress accounts and URL for XMLRPC API.
  3. Make the config properties files in our code, update the ToolConfig.properties file to your settings.
  4. Try to run the GoogleToWordPress.class to let it: a. search your folder to find out all index.html file b, extract the html content of your sites pages and convert them to posts for WordPress. c. write to WordPress via XMLRPC API.
  5. Feel free to change the code upon the needs. Java code is really simple and we use the lib of jwordpress of Bican to access the WordPress

Enjoy it if it is useful.