SAP HANA Self Learning as Never Before(Part 1) – the first lesson for startups to learn HANA
SAP HANA Self Learning as Never Before: As a technology advisor to the startups at SAP Startup Focus Program, I have the opportunities to work with the innovative startups from all different kinds of areas. SAP Startup Focus is a 12-month global program for startups with big data, predictive analytics and/or real-time data decision solutions. We make SAP HANA available to the startup community, help eligible startups accelerate the development of their solutions. We also help the startups with validated HANA solution accelerate market traction.
I’m from the second phase of the program that we call Development Accelerator, in which phase we are helping startups to build the Minimum Viable Product(MVP) within 1-year period of free technical supports. As a hands-on person of the team, my job is to help the startups solve all different kinds of technical problems, advise architecture designs. I also own the technical thoughts and creations of technical contents for startup educations, the team I am with have done many prototyping workshops(a 1-day classroom training) to train the engineers from startups. We have trained many startups all over the world.
I had studied some existing educational content, thanks to SAP product and development teams in creating the amazing SAP HANA Interactive Education (SHINE), it will be a solid start for new developers to SAP HANA. Click the link to learn more about SHINE.
The reason we found that we cannot just reuse SHINE is because of the diversity of the startups, many of them aren’t in the area of the enterprise world and it is hard for them to understand the data model of SAP EPM system. Besides, startups want to have something fun so I decided to create something interesting and more close to the mindset of startups. We have used these contents to train many startups, the feedbacks we received that they are very much enjoyed that I decide to share it with you through a series of blogs and this is the first blog that in which I will cover the overview.
Ok. Let’s get started. I do want to tell you that I have evaluated many open datasets that include twitter you may have seen in my another blog, LinkedIn data and some other dataset, eventually CrunchBase data stands itself out because it is so close to what I want. For those who don’t know CrunchBase data yet, the simple description is it is the dataset about Startups, Investors, Competitors, Fundings and Acquisitions that you can imagine it is very close of the startups’ daily life.
CrunchBase is a free database of technology companies and start-ups operated by TechCrunch, which comprises around 500,000 data points profiling companies, people, investors, fundings and acquisitions. Below is the number of points for each entity type in CrunchBase:
CrunchBase itself don’t compare the companies and there is no option to aggregate and calculate even discover the relationships between the various datasets, by loading the data into a in-memory database like SAP HANA and utilize the data modeling tool or embedded analysis algorithms, some very interesting questions like below can be answered in real time:
- What kind of companies have more opportunities to be invested or acquired?
- What are the likable competitors of a company?
- What is the location distribution of companies had received investments over 3 rounds?
- What are the shortest or average time to IPO?
The diagram below shows the entity relationships. For each company, it can have zero to multiple funding rounds, acquisitions, IPOs, persons work or had worked for the company, competitions as well as offices. The financial organizations are usually the venture capitalists.
You can think there are many ways to use the data to find the insights behind startups and investors community. But don’t forgot our mission here is to use it to demonstrate HANA capabilities, here are some examples:
- Modeling: Investment history model to aggregate all the funding records of each financial organization
- SQLScript Procedures: Define proprietary algorithms to calculate startup ranks based on the fundings, competition landscape analysis
- Text Analysis: Extract sentiment results of company related information
- Predictive Analysis: Investor clustering
- Geospatial Analysis: Funding and acquisition location distributions
- Visualization: Using SAPUI5 for Mobile, CVOM charts to show funding, acquisition records
- XS Engine(OData & XSJS): Declare OData services or XSJS services for data exposure to UI layer
Ok, now let’s take a look the applications I have been created.
1. Startup Profile, Ranking, Funding Visualization. Is Twitter still a startup, maybe I should use another company as an example ????
2. Competition Analysis, algorithms implemented in SQLScript to find out the competitors
3. Global Startup Funding Heat-map, use SAP HANA Geospatial Engine and Google Maps as the client
4. Investor Clustering by K-means, use SAP HANA Predictive Analysis Library
5. Company Sentiment Ratings, use SAP HANA Text Analysis
6. Discover Startups and Investors, use most of SAP HANA Platform features
Investor Profile Page