Google Nature Language API

Here is a notes about the google nature language API summary I wrote  three years ago. some feature may change a bit, refer to here for latest:

https://cloud.google.com/natural-language/docs/
Refer to here for basic concepts :
https://cloud.google.com/natural-language/docs/basics

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Features:

The Google Cloud Natural Language API provides natural language understanding technologies to developers, including:

  • sentiment analysis, – English
  • entity recognition, – English, Spanish, and Japanese – in fact is : nouns analysis
  • and syntax analysis – English, Spanish, and Japanese

API has three calls to do each, or can do them in one call, analyzeEntities, analyzeSentiment, annotateText.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
sentiment analysis

return the value to denote the text emotion negative/positive related extent only now.
has the polarity and magnitude value as the result.

entity recognition

find out the “entity” in text – prominent named “things” such as famous individuals, landmarks, etc.
return with entities and there URL/wiki, etc.

syntactic analysis

two things returns from syntax:
1. return the sentences/subsentences of input text.
2. return the the tokens (words) and there meta in grammar syntax dependency tree.

 

Test Steps of commands ++++++++++++++++++++++++++++++++++

gcloud auth activate-service-account --key-file=/yourprojecytkeyfile.json

gcloud auth print-access-token

print-access-token, This will give you a token for following commands. I create three json files to test each feature:
entity-request.json
syntactic-request.json
3in1-request.json.
so I can use these commands to try three API now:

curl -s -k -H "Content-Type: application/json" \
-H "Authorization: Bearer ya29.CjBWA2oWnup6dVvAlv6NTJyLsDtfqdCx70tX6_J0H7KFngd1ual2Osd8gCpcc" \
https://language.googleapis.com/v1beta1/documents:analyzeEntities \
-d @entity-request.json

curl -s -k -H "Content-Type: application/json" \
-H "Authorization: Bearer ya29.CjBWA2op6dVv_T7nAlv6NTJyLsDtfqdCx70tX6_J0H7KFngd1ual2Osd8gCpcc" \
https://language.googleapis.com/v1beta1/documents:annotateText \
-d @syntactic-request.json

curl -s -k -H "Content-Type: application/json" \
-H "Authorization: Bearer ya29.CjBWA2oWnup6_T7nAlv6NTJyLsDtfqdCx70tX6_J0H7KFngd1ual2Osd8gCpcc" \
https://language.googleapis.com/v1beta1/documents:annotateText \
-d @3in1-request.json

About how to create these json file as input, please refer to google SDK doc.

 

 

 

Advertisements

Spoken dialogue system open resources

OpenDial – java

https://github.com/plison/opendial

https://pdfs.semanticscholar.org/7981/324bcad5812ccf789d2091414e19138047dc.pdf

DeepPavlov – Python

https://github.com/deepmipt/DeepPavlov

Jindigo – Java

http://www.speech.kth.se/jindigo/

jVoiceXML – JAVA

https://github.com/JVoiceXML/JVoiceXML

CMU RavenClaw – C++/Perl

https://www.cs.cmu.edu/~dbohus/ravenclaw-olympus/index-dan.html

PED – prolog

http://planeffdia.sourceforge.net/main/

OwlSpeak – Java

https://sourceforge.net/projects/owlspeak/

IrisTK – java

http://www.iristk.net/index.html

InproTK – java python

https://bitbucket.org/inpro/inprotk

Rivr – Java – voiceXML

https://github.com/nuecho/rivr/#overview

 

summary ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

https://github.com/EllaVator/EllaVator/wiki/Open-source-dialog-frameworks

Cloud Commercial like the FB wit.ai, Microsoft LUIS, Nuance and google api.ai

Speech Recognition and Speech Synthesis open resources

CMU-Sphinx  C/C++/JAVA

Kaldi

HTK

Julius

RWTH

simon

iATROS-speech

SHoUT

Zanzibar

OpenIVR

MSDN-SAPI:http://msdn.microsoft.com/zh-cn/library/ms723627.aspx

CMU-Sphinx: http://en.wikipedia.org/wiki/CMU_Sphinx

HTK Toolkit:http://htk.eng.cam.ac.uk/

Julius:http://en.wikipedia.org/wiki/Julius_(software)

RWTH ASR:http://en.wikipedia.org/wiki/RWTH_ASR

http://en.wikipedia.org/wiki/List_of_speech_recognition_software

 

http://ibillxia.github.io/blog/2012/11/24/several-plantforms-on-audio-and-speech-signal-processing/

http://zh.wikipedia.org/wiki/语音识别
http://baike.baidu.com/view/549184.htm

 

 

Speaker Recognition and Diarization open resources

Speaker Recognition

ALIZE/LIA_RAL – C++

https://github.com/ALIZE-Speaker-Recognition/LIA_RAL

SIDEKIT  – python
MSR Identity Toolbox – matlab
Kaldi – scripting
Examples
===========================================================================

Discussion

http://habla.dc.uba.ar/gravano/ith-2014/presentaciones/Dehak_et_al_2010.pdf

 

GMM-UBM i-Vector

http://cslt.riit.tsinghua.edu.cn/mediawiki/images/c/cb/131104-ivector-microsoft-wj.pdf

https://people.csail.mit.edu/sshum/talks/ivector_tutorial_interspeech_27Aug2011.pdf

https://speechlab.sjtu.edu.cn/pages/sw121/homepage/2016/05/20/ivector-tutorial/

https://blog.csdn.net/xmu_jupiter/article/details/47209961

https://blog.csdn.net/zhangxueyang1/article/details/66971997

 

Speaker Diarization

LIUM – JAVA

http://www-lium.univ-lemans.fr/diarization/doku.php/welcome

https://github.com/StevenLOL/LIUM

kaldi CALLHOME_diarization – scripting

https://github.com/kaldi-asr/kaldi/tree/master/egs/callhome_diarization

https://github.com/Jamiroquai88/VBDiarization

Pyannote – python

https://github.com/pyannote/pyannote-audio

aalto speech – python for segment

https://github.com/aalto-speech/speaker-diarization

 

 

 

Text structure extract from PDF brainstorming

1. Use existing tools like grobid.

https://github.com/kermitt2/grobid

Use machine learning to get scientific paper structure data.  It has demo page at here, http://cloud.science-miner.com/grobid/ . My tests show that it can extract title and some other data, but still content maybe mixed with footer and headers.  As this one is designed for academic docs, so it may have some issues to other types of PDF.

2. Borrow the ideas like grobid, to build a system to adapt to the PDF types that you use.
Unified format PDf may get better result, but not for general PDF.

3. Convert a PDF to doc file, and then use doc tools to extract content structures out?

 

Some brain storming ideas:

1). Use tool like pdfclown to extract position and style info of the PDF text data.

2). Then for same category PDF share patterned style and positions, we have chance to find the structures of file.

3). And based on the style of text, it is possible have a tree-like text structure, but this tree may not match with the real chapters tree. This method can help on section levels and titles

4). How to find the main content text?
By statistical info of the text font, the biggest portion of the occurrence normally the text content section. As the main content has different styles with other part, it is possible get good result here.

5). By check from bottom of each page to center’s first line, if PDF have many pages and a unified footer format, then it possible to find what style and font is for footer, and possible find out the footer pattern by statistical.

6). Use same trick of footer, it is possible find out the header if page have.

7). If we have position and style info of the PDF text data.
Then we maybe can do classification based on the position and styles training too to find the basic structure of file.

 

Some similar work and papers

a chinese patente for this:
https://patents.google.com/patent/CN107358208A/zh

a ch team works called Xed

https://diuf.unifr.ch/main/diva/research/research-projects/xed
http://diuf.unifr.ch/diva/siteDIVA04/publications/XedDIAL04.pdf

LA-PDFText
https://scfbm.biomedcentral.com/articles/10.1186/1751-0473-7-7
https://github.com/GullyAPCBurns/lapdftext

HP paper
https://www.researchgate.net/publication/220932927_Layout_and_Content_Extraction_for_PDF_Documents
https://link.springer.com/content/pdf/10.1007%2F978-3-540-28640-0_20.pdf

a text classification algorithm
https://www.sciencedirect.com/science/article/pii/S153204641630017X
that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories.

Tools for PDF text extract

PDF Tools list

Jpedal (commercial software)
https://www.snowtide.com/ (commercial software)
itext (commercial software)

apache tika

Home

Grobid

LA-PDFText
pdftotext
pftohtml
pdftoxml
PdfBox
pdf2xml
LA-PdfText
PdfMiner
pdfXtk
pdf-extract
pdfx
PDFExtract

Icecite

 

1. Tools compare and benchmark

http://okfnlabs.org/blog/2016/04/19/pdf-tools-extract-text-and-data-from-pdfs.html
http://ad-publications.informatik.uni-freiburg.de/benchmark.pdf

PDF format
https://stackoverflow.com/questions/88582/structure-of-a-pdf-file

2. How to remove the header and footer from PDF

http://www.massapi.com/class/pd/PDFTextStripperByArea.html
https://www.programcreek.com/java-api-examples/index.php?api=org.apache.pdfbox.util.PDFTextStripperByArea
http://what-when-how.com/itext-5/parsing-pdfs-part-2-itext-5/

3. Find paragraph in text

https://stackoverflow.com/questions/39196676/how-to-read-a-paragraph-from-a-file-in-java

https://stackoverflow.com/questions/14990619/getting-paragraph-count-from-tika-for-both-word-and-pdf

4. Find sentences

https://stackoverflow.com/questions/9492707/how-can-i-split-a-text-into-sentences-using-the-stanford-parser

7. Structure extract from PDF

 

https://github.com/kermitt2/grobid
https://diuf.unifr.ch/main/diva/research/research-projects/xed
http://diuf.unifr.ch/diva/siteDIVA04/publications/XedDIAL04.pdf
https://github.com/GullyAPCBurns/lapdftext
https://www.researchgate.net/publication/220932927_Layout_and_Content_Extraction_for_PDF_Documents
https://link.springer.com/content/pdf/10.1007%2F978-3-540-28640-0_20.pdf
https://www.sciencedirect.com/science/article/pii/S153204641630017X

How to build and run the CMU Olympus-Ravenclaw dialog system framework – 3

How the Olympus system works ? This is a summary after I read a bit of the code of Olympus.

1. How system started?

Open your SystemRun.bat, it will call this line:

START “” /DConfigurations\%RunTimeConfig% “%OLYMPUS_ROOT%\Agents\Pythia\dist\process_monitor.exe” %RunTimeStartList%.config

About the startlist.config file and Pythia’s process_monitor.exe (MITRE in old project) , you need to read this page:

http://wiki.speech.cs.cmu.edu/olympus/index.php/Pythia

http://communicator.sourceforge.net/sites/MITRE/distributions/GalaxyCommunicator/contrib/MITRE/tools/docs/process_monitor_tut.html
http://communicator.sourceforge.net/sites/MITRE/distributions/GalaxyCommunicator/docs/manual/index.html
Pythia is windows process manager that control many process start and stop. Written by pl and build as process_monitor.exe, it will read in the startlist.config file to control each process of system.

2. How each module started?

Pythia started many process and they communicate each other, each process has its ability as module in system, like ASR, TTS. They work together to make whole system work.

Example:
TTYRecongnitionServer is the module to interface with terminal in audio and keyboard input way. Pythia in fact will start this file : ttyserver_start.config to start this process. It will in fact run this cmd as one process:

$OLYMPUS_ROOT\bin\release\TTYRecognitionServer.exe 
--start - is to let Pythia start it
--input_line - is to let Pythia open a input box for it on the GUI.

3. What is the HUB?

So how each process feature doing and how these processes communicate with each other is very important now. Each process will be called a server, so there is a HUB to link all the servers together:

http://communicator.sourceforge.net/sites/MITRE/distributions/GalaxyCommunicator/docs/manual/index.html
http://communicator.sourceforge.net/sites/MITRE/distributions/GalaxyCommunicator/docs/manual/reference/hub.html

So what is a server?
http://communicator.sourceforge.net/sites/MITRE/distributions/GalaxyCommunicator/docs/manual/reference/adding.html

5. So how hub and server exchange data?

There is pgm file defined all servers info, name port, and rules etc, hub just read this file  and then will link all servers to exchange data.
Rules in programs (like main) tell Galaxy what the Hub should do when it gets a certain message

http://wiki.speech.cs.cmu.edu/olympus/index.php/How_does_the_hub_work

http://communicator.sourceforge.net/sites/MITRE/distributions/GalaxyCommunicator/docs/manual/tutorial/how_it_works.html

Now you have basic structure of the whole system.

http://wiki.speech.cs.cmu.edu/olympus/index.php/How_The_CMU_Communicator_Architecture_Works

 

5. So how task is organized as dialog system (ravenclaw) ?

ravenclaw use tree to define the task relations, like example here:

 

Sub node under a task is a sub task, to finish a task you need to go through from left to right to do each task. And they define a set of macro in C and developer need to use these set of macro to define this task tree:

example like this:

http://trac.speech.cs.cmu.edu/repos/olympus/tutorials/Tutorial1/branches/2.5/Agents/RavenClawDM/DialogTask.cpp