To Hadoop or Not

Refer “Big Data” or “Data Analytics” today and you would definitely hear of “Hadoop”. Hadoop is often positioned as “The Framework” which would solve all your Big Data needs. This framework from Apache Software Foundation is an Open Source Framework, i.e. anyone can use the framework for free and is capable of handling huge data sets across a large set of commodity hardware. Therefore, it is popular among the development community.

Big Data, from its very nomenclature refers to data which is huge, structured or unstructured. Examples of such data are the enormous amounts of data in social media sites, evolved due to interaction of the fraternity of social media users of Facebook, Twitter, LinkedIn, etc on a day to day basis. Types of such data include the various types of chat information, images, videos, etc which are being used by the users of such applications. Other applications providing such data are from iOT applications like data related to processes in industries or manufacturing units such as temperature, pressure, etc. that keep on changing over real time and result in huge data sets, if we measure the data over certain periods of time. Other such data could be data related to telecom usage, space or weather data, stock trading data, etc.

Hadoop is handy if your needs are that of ETL (Extract – Transform and Load) Operations. However, do not get into the Hadoop trap unless you have a clear understanding of your business needs. Ask yourself the following questions before you decide on investing your time and money implementing on Hadoop.

  • Do you really have terabytes or petabytes of data to be processed? Hadoop was designed to handle huge data volumes of this scale. However, a report from Microsoft states that majority of jobs process less than 100 GB of data. If your data size is lower than terabytes, you may not require a Hadoop. Even if your data size is more than terabytes, do you really need to process all your data?
  • Are your data needs real time? If your expectations of processing data are real time, Hadoop is not the best tool to meet your needs. In fact, Hadoop requires some time to process data and is a very good batch processing tool. In case your business is to interpret data in real time, such as movement of data in the stock markets for taking real time decisions such as buying or selling stocks, Hadoop is not the answer.
  • Do you require a quick response? Your requirements for response time need to be well understood. If your user is not interested to wait for a minute to look at response for large data sets, you may have to use other real time applications and not Hadoop.
  • Does your requirement involve complex and computation intensive algorithms? The alogorithm of MapReduce in Hadoop is efficient in handling processing of large volume of data on a parallel processing mode by dividing large files into smaller files and storing across machines. However, this is not apt for requirements which are computation intensive and having large number of intermediate steps of data during computation, e.g. the computation of Fibonacci Series. A few machine learning algorithms also do not fall in the paradigm of MapReduce and therefore the pertinent question here to decide is whether the business requires high usage of specialised algorithms. In that case, the technology experts need to analyze if the algorithms required are MapReducible.

Hadoop would be your choice when:

  • You want to transform largely unstructured or semi-structured data into usable form.
  • You want to analyze information of a huge set of data to obtain insights, and you have ample time for such analysis.
  • Have overnight batch jobs to process, e.g. daily transactions processing in case of credit card companies
  • When the insight gained from such data analysis is applicable over a longer period of time, e.g. social behavior analysis in case of social sites, e.g likes and affinity analysis, job suggestions based on browsing history, etc.
  • Hadoop utilizes key value pairs during its processing efficiently, and such forms of data are ideally useful for its operations.

So… before you rush in to select Hadoop as your framework, analyze your needs carefully. Though Hadoop is a free framework, its implementation might require effort and cost and the budget for implementation may not be that cheap.

Reasons Why Twitter Is Losing Popularity

It is amazing!! Twitter, which one of the most popular social media platforms, has been losing its sheen at an alarming pace! As a matter of fact, the number of active members on Twitter has declined over the last few months. . In the end of 2015, Twitter had 305 million active users against 1.6 billion active Facebook users, and 1 billion active Google users.

In September 2015, even Instagram witnessed a bee-line of 400 million users. Twitter’s stocks have suffered a plunge as a consequence of its performance. Also, heads have rolled including replacement of CEO – Dick Costolo by Jack Dorsey.   So, what’s gone amiss with Twitter?

We have tried to investigate, and analyze a few reasons for Twitter losing its market share and engagement:

  • High Attrition of Customers: Unlike the other social media tools, Twitter offers a niche product for its users to share, collaborate and market their offerings. However, as a product, it has not been at par with its counterparts in terms of product engagement of users.The primary reason for this could be attributed to the high turnover of active users of this platform. During the last 3 months in 2015, Twitter lost about 2 million customers. As a result, 1% reduction in overall number of customers was experienced by Twitter – a drop from 307 million customers during third quarter of 2015 to 305 million during fourth quarter.
  • Low Quality of Interaction: The nature of interaction in Twitter, particularly the use of abusive language has something to do with the flights of the users from the platform. While Twitter has setup a platform which allows people to interact freely with other users and institutions, it is not in position to control the flagrant abuse of the platform. In general, abusive, pointless, painful, or difficult interactions can scare away users from a platform, and that is what has exactly happened on Twitter. 
  • High Noise Level: Twitter excels in content liquidity. This means that there are a number of tweets, or content pieces flowing into Twitter. The effective content, or the signal to noise ratio for such content is very low. Links related to news articles, blog posts, videos, spam links, etc. take one away from Twitter, or distract the users. In this process, the desired content is lost. On the other hand, the other social media sites like Instagram, or Facebook have less noise levels or distractions as compared to Twitter. 
  • Buried Tweets: Twitter does not restrict the activities of the followers, and their display of content. Hence, it gets flooded with a host of irrelevant content from various sources. Consequently, the desired tweets, and discussions are lost. For instance, if you have a thousand followers on Twitter, your tweet is only one out of thousand sources of content. Once other users start tweeting, your tweet or content would be pushed down, and buried deep below contents of other users.

 Absence of Visual Marketing: While Twitter serves as an excellent platform for promptly sharing the news feeds, the screens are usually devoid of pictures, images and videos unlike its competitors. For example, an Instagram, or a Facebook post ideally is more visually appealing than Twitter. While Twitter is popular mostly among men, women show more penchant towards Facebook, and Instagram than Twitter because of their visual content. 

  • Restricted to News Feeds: As an interactive platform, Twitter is used more often to share news feeds, i.e. “what is happening”. Contemporaries like Facebook and Instagram, elaborately present what is being done, or experienced by users. So, these networks are more interactive in nature than Twitter from social perspective. 
  • Concerns Related to Data Privacy and Security: Data Privacy is an important requirement of users in this Internet era. Since Twitter shares all the personal data with third parties, there is a sense of insecurity that users feel while using this platform. Identity theft, data compromise, and fraud tend to happen more frequently on Twitter than the other social media platforms.
  • Tweet Size too Small for Users: Twitter places a restriction of 140 characters on each tweet. Not all users find this restriction acceptable, and therefore, they prefer to use other platforms where such stringent limitations do not exist.

 Number One Social Site for Spammers: Spammers are constantly on the prowl on Twitter. It does not set restrictions on the followers, and there could be multiple users spamming a user’s account with irrelevant content. Users tweet to promote their websites, articles, blogs and so on, thereby leading to a splurge of unwarranted information. Moreover, spammers could sneak in, and get access to personal information. 

  • Useful to Celebrities: Celebrities tweet to promote their popularity, thereby keeping their fans constantly hooked on to their updates. Politicians, and their adversaries use this platform to throw barbs at each other, and create unnecessary commotion.

 Highly Addictive for Users: Usage of Twitter is like an addiction, where precious, and valuable time is consumed. It deviates one from core activities. As a result, many users have decided not remain active on Twitter anymore.

 

 

 

Is the World a better Place for Travel

Worldwide economic slowdown, conflicts and terrorist attacks, and the European refugee crisis, apparently seem to have their impact on the global travel & tourism industry.  However, when we look at the actual trend, the industry has maintained its growth despite all the adversities. A report from IPK International World Travel Monitor, 2015 reports a 4.5% growth in actual outbound trips in 2015, with a healthy increment of 4.3% estimated for 2016.

Outbound travel is primarily fueled by Asia Pacific and North America.  Germany, as a country is the ‘world travel champion’ – a leader in outbound travel. The United States follows Germany and continues to be both a leading source and destination for travel. China enjoys a leadership position in the travel industry, being next only to USA in terms of spending.

The European economy has improved slightly from 2014, primarily due to the growth of Germany. Overall, the forecast for Europe is a net growth of 2.8% in outbound travelers. While Europeans have maintained their momentum, they are likely to travel to safer destinations, avoiding the zones of conflicts and terrorism. Also, there is good growth in inbound travel to Europe, and the expected growth is between 3 to 4 % in 2016. Travelers from China and Asia Pacific countries, USA and Japan are all keen for travelling to Europe.

Outboubound Travel Forecast Percentage GrowthEconomic growth has slowed down to a certain extent in Asia Pacific, but, despite the slowdown, the number of travelers have only increased and the projected growth of 6.3% for outbound travelers is on expected lines. The growth rate as per IMF report for 2015 shows that India’s rate of growth is highest at 7.6%, ahead of China, and therefore, outbound travelers from and to India are expected to increase.

While North America is expected to show good growth in outbound travel, South America’s 1.9% growth is a cause for concern. About half of South America’s outbound travel market is catered to by Brazil and Argentina. Traditionally, South Americans travel internationally within the same region. One international event which could improve the percentage of international travel to Brazil could be the Olympic Games planned to be hosted in the city of Rio de Janeiro this year. The last Football World Cup in Brazil in the year 2014 caused more than half a million visitors to Brazil, and this trend is expected to be seen during the Olympics.

The Middle East travel market is one of the fastest growing markets and countries like Saudi Arabia and United Arab Emirates (UAE) are leaders in this area. The region is noted for travelers with deep pockets, who usually travel for long durations (with average trips for more than 14 nights). Also, more than 30% of travelers are immigrants travelling to meet friends and relatives. Inbound travel to Middle East has been seriously hurt due to the ongoing conflicts in that zone.

Last, but not the least, social media plays a vital role in international travel today. 70% of international travelers are active users of social media such as Facebook, Twitter, WhatsApp, LinkedIn, Google+. About 30% of the international travelers actively use social media for planning their trips. Marketers would do well to be creative with innovative approaches, so as to influence this section of buyers to plan their trips.

Do Not Let Your Data Kill You – The Need for 3 R’s – Reduce, Recycle and Reuse

As the saying goes – anything in excess is a waste. Isn’t it true for information today?  Information or “data” – the four letter word which is more representative of the digital world has overwhelmed you, me and everyone transcending this space. Data in this form has various connotations – the more popular “Big Data”, Large or complex data, humongous data, etc.

On an average, data of companies have been increasing at a rapid pace – about 100% or more every year. Also, with users of social media being overactive, data transactions have multiplied manifold in real time. Though technical advances are being made to store this data in large repositories, there is a need for deriving context – meaningful information so as to Reduce, Recycle and Reuse data. For example, companies would like to use their data to understand and interpret information such as employee interactions, communications and client engagements. Data that is not used, but occupies useful repository space is a costly waste and needs to be eliminated. Regulatory requirements require one to use data to create intelligent and statutory reports that can be audited easily if the need be. The 3 R’s put in practice improve data management in a business environment:

Reduce:  Regulatory requirements for data, e.g. PCI data storage requirements or other Information governance or compliance standards, require one to be circumspect before planning for reduction of data. This challenge for cleaning up data not only results in a large volume of unused data, but also results in saving of data in local repositories of users with subsequent backups by the IT team.

Therefore, how do I reduce unused data? A Document Retention Policy, specifying the criteria for holding or removing data, the process governing such a decision and the relevant owners to implement and oversee is the first proactive step that any company can adopt that only appropriate data is maintained. With a policy in place, the discipline to actually implement such a policy enables a large reduction in unused data.

Recycle:  Regulatory Reporting is an important aspect for many industries. For example, in the US, Health industry related reports are mandatory, not only for the companies, but also for the patients, and the industry is well regulated.  Taxation or Financial obligations also require statutory reporting and audits. It is important for the data to be recycled and processed into useful reports for the auditors and the statutory authorities. Usually, intelligent software, ETL techniques, help in recycling such data.

Reuse: The most interesting part of data management is Reuse of data. The world of Business Analytics and Business Intelligence has offered options for deriving business insights from a large data set and intelligently reuse data. A new science “Data Science” has evolved in its own right and is promptly advocated by the Harvard Business Review. The HBR article from Thomas H Davenport and D J Patil in fact refers the job of a data scientist as the “sexiest job of 21st century”.

A few terms often used for reuse of data are:

  • Data Science: This is a term which loosely entails the combo of computer science, analytics, statistics, and data modeling. While this is a loose combination, and some companies have evolved their own courses or certifications, it still needs to mature as a science with comprehensive tenets and elaborate literature.
  • Smart data: Smart data is usually a subset of Big Data, with noise filtered out. While Big Data can be characterized by its attributes – variety, velocity and volume, a smart data is usually is characterized by velocity and value. Smart data is a key ingredient for intelligent BI Reporting.
  • Predictive Analytics: It involves smart methodologies utilizing data – machine learning techniques and statistical algorithms to predict the future outcomes of data. Companies gain out of predictive analytics by deriving or planning important outcomes from past data, e.g. revenue or profit.
  • Real Time Analytics: Analytics served real time, e.g. stock prices moving up or down, updates on page views, sessions, bounce rates, page navigation, advertisements dynamically adjusted based on type and frequency of customer usage, etc.
  • Intelligent Decision Systems: Use of Artificial intelligence in association with data is an area that helps users to derive the best and optimized decisions based on a large number of input variables. While this is still evolving, it can be used in number of areas such as building marketing systems that offer customers based on profile analysis, blocking of fraudulent transactions in credit card operations, etc.
  • Data Visualization: Pictorial or graphical representation of data intelligently, in an interactive way, help business professionals to identify trends and patterns in their data, e.g. sales data region-wise, or by customer profile.
  • Big Data Analytics: Reuse of data is not complete unless we use the term Big Data. The concept of Big data analytics has evolved from companies managing huge sets of data such as oil companies or telecommunication companies to social media such as Facebook, Twitter, LinkedIn that involve large data sets. This form of analytics help us to derive hidden patterns, market trends, preferences of customers, unknown correlations, etc.

 Business Data Analytics, therefore is in its infancy, to be nurtured, developed and evolved over the years. The attraction therefore is immense, and so is the job of the Data Scientist!!!