Issue: An irrecoverable stack overflow has occurred.


An irrecoverable stack overflow has occurred. Please check if any of your loaded .so files has enabled executable stack (see man page execstack(8))


This issue happens when I call a c++ So lib from java by JNA. This mostly caused by input a big chunk of data into the function interface of the C++ function. There is a limit of the Thread Stack Size defined for java’s thread. Normally is 512K-1024K.


And by change the Xss size you could change the stack size. but you should be careful on this when you have many threads need to open.






Free text/code online share tools

You can immediately open a browser and share the code/text link with other people. And you do not even need to register account for using .

These are some very useful site for remote sharing something in a prompt or do a very simple phone interview for startup.

  5. Google doc


  1. Not free –
  2. Not free –

Spoken dialogue system open resources

OpenDial – java

DeepPavlov – Python

Jindigo – Java

jVoiceXML – JAVA

CMU RavenClaw – C++/Perl

PED – prolog

OwlSpeak – Java

IrisTK – java

InproTK – java python

Rivr – Java – voiceXML


summary ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Cloud Commercial like the FB, Microsoft LUIS, Nuance and google

Speech Recognition and Speech Synthesis open resources

CMU-Sphinx  C/C++/JAVA












HTK Toolkit:





Speaker Recognition and Diarization open resources

Speaker Recognition


SIDEKIT  – python
MSR Identity Toolbox – matlab
Kaldi – scripting



GMM-UBM i-Vector


Speaker Diarization


kaldi CALLHOME_diarization – scripting

Pyannote – python

aalto speech – python for segment




Text structure extract from PDF brainstorming

1. Use existing tools like grobid.

Use machine learning to get scientific paper structure data.  It has demo page at here, . My tests show that it can extract title and some other data, but still content maybe mixed with footer and headers.  As this one is designed for academic docs, so it may have some issues to other types of PDF.

2. Borrow the ideas like grobid, to build a system to adapt to the PDF types that you use.
Unified format PDf may get better result, but not for general PDF.

3. Convert a PDF to doc file, and then use doc tools to extract content structures out?


Some brain storming ideas:

1). Use tool like pdfclown to extract position and style info of the PDF text data.

2). Then for same category PDF share patterned style and positions, we have chance to find the structures of file.

3). And based on the style of text, it is possible have a tree-like text structure, but this tree may not match with the real chapters tree. This method can help on section levels and titles

4). How to find the main content text?
By statistical info of the text font, the biggest portion of the occurrence normally the text content section. As the main content has different styles with other part, it is possible get good result here.

5). By check from bottom of each page to center’s first line, if PDF have many pages and a unified footer format, then it possible to find what style and font is for footer, and possible find out the footer pattern by statistical.

6). Use same trick of footer, it is possible find out the header if page have.

7). If we have position and style info of the PDF text data.
Then we maybe can do classification based on the position and styles training too to find the basic structure of file.


Some similar work and papers

a chinese patente for this:

a ch team works called Xed


HP paper

a text classification algorithm
that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories.