Hello everybody,

This is the 4th part of the BlogProject series. I wanted to point out that my profile has changed (I had to create an external account) but I continue to be the same person. So from now on my posts will be created by this account.

In this part of the series we will see how to utilize the text analysis capabilities provided by SAP HANA and how to use SQLScript to embed our data-intensive application logic into the database.

If you missed the previous posts about the creation of an XS project, the persistence model (especially the data schema in the 2nd part) and the analytic model make sure to have a look:

https://scn.sap.com/community/developer-center/hana/blog/2014/05/27/introduction-to-hana-xs-application-development-part-1-blogproject-creation

https://scn.sap.com/community/developer-center/hana/blog/2014/05/27/introduction-to-hana-xs-application-development-part-2-blogproject-persistence-model

https://scn.sap.com/community/developer-center/hana/blog/2014/06/05/introduction-to-hana-xs-application-development-part-3-blogproject-data-modeling-with-information-views

TEXT ANALYSIS

Text analysis in HANA is supported with indexes that are built dependently to certain tables and more specifically to text columns.

In our use case, in part 2, we have created our tables, which include a table called POST that has a column named “Subject”. We intended to add to that column the subject that the text analysis would indicate. And this is what we will do. But first we have to create the analysis that will show us the subject.

To create a simple text analysis we will define an index using SQL console, adding the below lines.

CREATE FULLTEXT INDEX “POST_TEXT_INDEX” ON “BlogProject”.“POST” (“PostText”)

CONFIGURATION ‘EXTRACTION_CORE_VOICEOFCUSTOMER’

TEXT ANALYSIS ON;

What we have done here, except from the creation of the index on the specific table and column, is to specify the configuration of the index and enable the text analysis feature of the index. Indexes support a lot of different configurations, but we used EXTRACTION_CORE_VOICEOFCUSTOMER to extract not only the basic entities, like PERSON and COUNTRY, but also sentiment related entities like Sentiment, MinorProblem and Topic.

The definition and properties of the index can be found under the “Indexes” folder of our schema in the Catalog and the content of the index can be found in the generated table located in the “Tables” folder.

I have added some posts to illustrate the result, which is shown in the table below.

 

Let’s see some of the most important columns and their content. To begin with, the column “ID” takes the id of the post that we reference in each row. Next, we have the “TA_TOKEN” column, which includes the part of the text that we analyze in each row. Lastly, the “TA_TYPE” column holds the most important part, the entity that was extracted by the analysis.

For example, in row 3 the highlighted text is “Looking forward for great content”, which the “TA_TYPE” column claims to be sentiment. In the next two rows, the analysis will split into two parts, the word “great” and the word “content”. The first one appears to be a “StrongPositiveSentiment” and the second a “Topic”.

This is basically how the text analysis works. Of course this is a very simple demonstration, as you can build your own indexes based on custom configurations and dictionaries.

So now we have extracted all the information we needed for each post. Next we will create some procedures that will eventually use the text index.

SQLScript (from SQLScript Reference)

Procedures

Procedures are basically used to describe complex dataflow logic, without having any side-effects and can have multiple input parameters and output parameters (which can be of scalar types or table types).

They describe a sequence of data transformations on data passed as input and database tables. Data transformations can be implemented as queries that follow the SAP HANA database SQL syntax by calling other procedures.

Read-only procedures can only call other read-only procedures. Even though procedures are read-only by default, we will see a way to bypass that feature and create procedure that can transform data.

Table Types: These table types are used to define parameters for a procedure that represent tabular results.

User Defined Functions

Functions are declared on attribute views and on tables and include reusable data-centric functionality usually called by procedures.

Functions can be of two types, scalar or table. A scalar function takes any primitiveSQL type as an input parameter and returns a scalar value. It can be used inside an SQL statement in the same positions that a table’s columns can be used. These positions are the SELECT and WHERE clauses.

On the other hand, a table function can have as input parameter, apart from any primitive SQL type, any table type. The output is always a table. Lastly, a table function is inly called in the FROM clause of an SQL statement.

Noted: Both function types MUST always return something.

Triggers

Triggers are special components that their call is dependent to another event that we specify. They are used to call tables, views, procedures and functions and implement a specific logic that we specify using SQLScript statements.

A trigger can be called before or after an event, such as the execution of an insert, update or delete on a database object.

BlogProject SQLScripting

In our project (see part 3) we have already dealt with the matter of data modeling for reading purposes. Now we will see how we can perform simple insert operations using procedures.

Procedures

To create a procedure we have to right-click on the “procedures” folder of our project and create a new “Stored Procedure”. For our application we will create two procedures that will insert a row to the POST and COMMENT tables.

But first we have to create a new Table Type that we will use as the output parameter of our procedures.

  • Create error table type (TT_ERROR)

The table types can be seen as templates of tables that hold no data, but are basically used as input or output parameters. For our case we will create an error table, which use we will see on the next step. To do so, we have to open the SQL console and type the below lines.

CREATE TYPE “BlogProject”.“TT_ERROR” AS TABLE (

“HTTP_STATUS_CODE” INT CS_INT,

        “ERROR_MESSAGE” NVARCHAR(100) CS_STRING,

        “DETAIL” NVARCHAR(100) CS_STRING )


  • Insert_post

In this procedure we make a simple insert, using specific variables, into the POST table.

PROCEDURE “BlogProject”.“MyBlogProject.procedures::insert_post”

(in row “BlogProject”.“POST”,

out error “BlogProject”.“TT_ERROR”)

       LANGUAGE SQLSCRIPT

       SQL SECURITY INVOKER

       DEFAULT SCHEMA “BlogProject”

       –READS SQL DATA

       AS

BEGIN

declare user string;

declare title string;

declare text string;

declare tag string;

select “UserID”, “PostTitle”, “PostText”, “Tags”

into user, title, text, tag

from :row;

if (:user = ‘ ‘  OR :title = ‘ ‘ OR :text = ‘ ‘ OR :tag = ‘ ‘) then

       error = select 400 as http_status_code,

                     ’empty field’ as error_message,

                     ‘All fields must be filled’ as detail from dummy;

else

       insert into “BlogProject”.“POST”

       values (“BlogProject”.“MyBlogProject.data::postID”.NEXTVAL,

             CURRENT_TIMESTAMP, :user, :title, :tag, null, 0, 0, 0, :text);

     

end if;

END;

First, we extract the data of the input table row into 4 variables (user, title, text and tag).  In the end, inside the insert statement we call the corresponding sequence (“MyBlogProject.data::postID”), we pass the CURRENT_DATE variable for our “Date” column, along with the default values (for Subject null, and 0 for Likes, Dislikes and Views) and the other data that we extracted previously and we want to insert.

In the middle of the script we can see the use of the table type “TT_ERROR” that we created previously. We use it for validation purposes, meaning that if the input does not include the 4 values we want, then the error message is returned.

  • Update_post_subject

In this procedure, what we want to do is update an inserted row in the POST table and add the result of our text analysis to the “Subject” column. So, let’s say that we will use as a subject the PRODUCT “TA_TYPE”, to see what our bloggers talk about. And if there is no PRODUCT reference, we want to see the ORGANIZATIONS that are referenced. Lastly, if none of them is referenced, we will leave the “Subject” column empty.

PROCEDURE “BlogProject”.“MyBlogProject.procedures::update_post_subject”

(in postID VARCHAR(10))

       LANGUAGE SQLSCRIPT

       SQL SECURITY INVOKER

       DEFAULT SCHEMA “BlogProject”

       –READS SQL DATA

       AS

BEGIN

       declare token string :=null;

       declare found int := 0;

       subject = SELECT “TA_TOKEN”, COUNT(TA_TOKEN) as “c”

              FROM “BlogProject”.“$TA_POST_TEXT_INDEX”

              WHERE “TA_TYPE” = ‘PRODUCT’ and “ID” = :postID

              GROUP BY “TA_TOKEN”

              ORDER BY “c” DESC

              LIMIT 1;

     

       counter = SELECT COUNT(*) as “c” FROM :subject;

     

       select “c”

       into found

       from :counter;

                                 

       if :found = 0 then

     

              subject = SELECT “TA_TOKEN”, COUNT(TA_TOKEN) as “c”

                     FROM “BlogProject”.“$TA_POST_TEXT_INDEX”

                     WHERE “TA_TYPE” = ‘ORGANIZATION/COMMERCIAL’

                           and “ID” = :postID

                     GROUP BY “TA_TOKEN”

                     ORDER BY “c” DESC

                     LIMIT 1;

       end if;      

       select “TA_TOKEN”

       into token

       from :subject;

       UPDATE “BlogProject”.“POST”

       SET “Subject” = :token

       WHERE “ID” = :postID;

END;

First, we want to extract any PRODUCT references. So, in the “subject” variable, we store the “TA_TOKEN”, basically the part of the post, that appears the most inside the post text, and the times it appears.

Then we extract the times the “TA_TOKEN” appeared and store it into the “found” variable. If “found” is 0, that means that there is no PRODUCT entity inside the post and we proceed with the extraction of any “ORGANIZATION/COMMERCIAL” entities.

Either way, we store the “TA_TOKEN” in the variable “token”, which can be a PRODUCT, ORGANIZATION/COMMERCIAL or null, if neither was referenced in the post.

Lastly, we perform the update statement.

  • Insert_comment

Now we will create the “insert_comment” procedure, that will insert a row in the COMMENT table. To do that, we follow the same logic with the “insert_post” procedure.

PROCEDURE “BlogProject”.“MyBlogProject.procedures::insert_comment”

(in row “BlogProject”.“COMMENT”,

out error “BlogProject”.“TT_ERROR”)

       LANGUAGE SQLSCRIPT

       SQL SECURITY INVOKER

       DEFAULT SCHEMA “BlogProject”

       –READS SQL DATA

       AS

BEGIN

declare user string;

declare post string;

declare text string;

select “UserID”, “PostID”, “Text”

into user, post, text

from :row;

if (:user = ‘ ‘  OR :post = ‘ ‘ OR :text = ‘ ‘) then

       error = select 400 as http_status_code,

                     ’empty field’ as error_message,

                     ‘All fields must be filled’ as detail from dummy;

else

       insert into “BlogProject”.“COMMENT”

       values (“BlogProject”.“MyBlogProject.data::commentID”.NEXTVAL,

        :user, CURRENT_TIMESTAMP, :post, :text);

     

end if;

END;

Note 1: After the creation of the procedures, if we try to activate them, an error will occur informing us that the procedures are READ_ONLY. The first part of solving this is deleting the “READS SQL DATA” line, as shown in the previous code. Then, we have to go to our “Systems” view -> right click on our system -> select “administration” -> “configuration” -> expand the “indexserver.ini” file -> choose “repository” and change the sqlscript_mode variable to UNSECURE. Now, the activation will be performed successfully.

Note 2: Inside the procedures and the other scripted components (functions, triggers), we can use the Calculated Engine operators for better performance (SQLscript Reference p.89).

Call procedures

To call a procedure from the HANA studio we will need the SQL console. Inside the console the procedure call seems like that:

call “MyBlogProject.procedures::insert_post”(“BlogProject”.“POST”,?)

The parameters inside the parenthesis represent the input