Press "Enter" to skip to content

Data extraction using Alchemy API

As I am very interested into information extraction from textual data, from the research perspective, I have tried to used the services from Alchemy API.

Alchemy API is known as one of the best natural language processing services and offers a number of APIs for: entity extraction, sentiment analysis, keyword extraction, concept tagging, relation extraction, text categorization, author extraction, language detection, text extraction, microformats parsing, feed detection and linked data support. I am  mostly interested into entity extraction and relation extraction as they are the key subtasks of information extraction.

The company offers AlchemyAPI as a service, on premise services and custom built services. They also offer 1000 requests/per day in the free tier (this is what I use).

To access the services, you need first to register for an access key and sign the agreement that you will not misuse the services. Then the services can be accessed using prepared SDKs (currently available for Node.js, Pyhton, Java, Android, Perl, C# and PHP) or directly using HTTP requests as the API is very straightforward.

The entity extraction should normally consist of named entity recognition and coreference resolution. Their service does not return coreference clusters, but they perform entity disambiguation and connect entities to known linked data sources like DBPedia, FreeBase, Yago and OpenCyC. An example of extracted data in the XML format:

[codesyntax lang=”xml” tab_width=”4″]

<entity>
            <type>Company</type>
            <relevance>0.451904</relevance>
            <count>1</count>
            <text>Facebook</text>
            <disambiguated>
                <name>Facebook</name>
                <subType>Website</subType>
                <subType>VentureFundedCompany</subType>
                <website>http://www.facebook.com/</website>
                <dbpedia>http://dbpedia.org/resource/Facebook</dbpedia>
                <freebase>http://rdf.freebase.com/ns/m.02y1vz</freebase>
                <yago>http://yago-knowledge.org/resource/Facebook</yago>
                <crunchbase>http://www.crunchbase.com/company/facebook</crunchbase>
            </disambiguated>
        </entity>
        <entity>
            <type>Person</type>
            <relevance>0.44154</relevance>
            <count>1</count>
            <text>Blair Garrou</text>
        </entity>
        <entity>
            <type>OperatingSystem</type>
            <relevance>0.43957</relevance>
            <count>1</count>
            <text>Android</text>
        </entity>

[/codesyntax]

For relation extraction, an example XML, returned as a result, looks as follows:

[codesyntax lang=”xml” tab_width=”4″]

<relation>
            <subject>
                <text>Madonna</text>
            </subject>
            <action>
                <text>enjoys</text>
                <lemmatized>enjoy</lemmatized>
                <verb>
                    <text>enjoy</text>
                    <tense>present</tense>
                </verb>
            </action>
            <object>
                <text>tasty Pepsi</text>
                <sentiment>
                    <type>positive</type>
                    <score>0.069017</score>
                </sentiment>
                <sentimentFromSubject>
                    <type>positive</type>
                    <score>0.0624976</score>
                </sentimentFromSubject>
                <entities>
                    <entity>
                        <type>Company</type>
                        <text>Pepsi</text>
                    </entity>
                </entities>
            </object>
        </relation>

[/codesyntax]

I have also tried some tools that are available from some research groups, but the AlchemyAPI seems to work with very high precision on real-life data. In the field of NLP the AlchemyAPI also represents one of the best and most comprehensive NLP suite.

Leave a Reply

Your email address will not be published. Required fields are marked *

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.