In present-day scenario, the utilisation of machines and technologies have gained prominence so much so that the organisations have become heavily reliant on its mechanisms for deriving valuable insights in order to maximise their goals (Andrejevic and Gates, 2014). At its core, this field is established on the principles of statistics which cater to the needs of government and organisations alike. Data Analysis is crucial for the organisations as it facilitates prompt and hassle-free solutions to a given set of challenges that arise either on a day-to-day basis or occasionally. Today, this area has evolved in the form of Artificial or Business Intelligence as well as Machine Learning thaty is helping companies worldwide to fulfil client needs
The following report aims to provide a detailed account on the terminology used in the field of Data Analytics. Additionally, the given project report also includes various analysis such as Exploratory, Predictive and Descriptive along with their application by way of employing a certain programming language or tool. Also, various issues related to data preparation, visualisation as well as assumptions regarding sampling and statistical inferences have been discussed in detail.
1.1 Common terminology in ‘data analytics’
Essentially, the concept of 'data analytics' can be defined as a process of collecting and examining a large pool of information so as to uncover hidden patterns which enable the business enterprises to gain valuable insights regarding organisational operations. Through this, the business manager is able to draw valuable inferences and gain insights in relation to a particular problem or situation. Since its a broad term, this area of analytics is constituent of wide array of elements. These have been explained as under:
A pool of information which includes specific sets of variables that have similar characteristics from which a statistical sample is drawn for the purpose of analysing and gaining insights regarding a particular hypothesis (Chen, Chiang and Storey, 2012).
It is a set of data which is chosen from a larger pool of information, usually known as 'population', that is representative of all the variables present in such information. It can said to be a fraction of population which is chosen by a researcher for further analysis of given dataset.
- Categorical Data:
It includes a certain set of variables which are grouped in the form of multiple categories on the basis of their characteristics or features such as race, sex and age group among others. Thus, it requires the utilisation of data tables.
- Ordinal Data:
A type of quantitative, statistical as well as categorical data wherein the distance between the variables is unknown even though such data is ordered as naturally occurring class. Hence, it is usually found in the form of an order or scale (Gandomi and Haider, 2015). For instance, measuring level of happiness on a scale of 1 to 10.
- Nominal Data:
A type of data which can neither be ordered nor be measured and is usually employed for the purpose of labelling variables without giving them any actual quantitative unit of measurement. For instance, male and female is a prominent example of Nominal Data.
- Continuous Data:
A set of infinite quantitative information, except for whole numbers, which can be measured as well as subdivided in a meaningful manner. For instance, temperature, height and weight among others.
- Discrete Data:
A type of finite statistical information which includes variables in the form of integers with a possibility of categorization. Also, it is not achievable to subdivide such information meaningfully and are usually in the form of numericals (Hazen and et.al., 2014).
1.2 Critical evaluation of the use of data analytic methods
With the advent of technologies in the form of Internet of Things (IoT) as well as Integrated Communications, it is easy to retrieve information, either primary or secondary, through content which is readily available online all around the world. However, in the language of statistics, not all information retrieved is usable for the research purposes. Hence, confirmation of relevancy, reliability and authenticity of such data collected is of paramount importance. For this purpose, an organisation or investigator may utilise various analytical methods. These have been enumerated as follows:
- Descriptive Data Analytics:
This technique enables the user of information to derive meaning to raw data, collected in the form of primary or secondary content or both, by employing data aggregation and data mining methodologies. For companies, such information facilitates in giving answer to the question “What has happened?”. It describes or summarizes the collected data in a manner which enable the interpretors to achieve insights in the form of past trends, patterns or consumer behaviours which have led to the occurrence of one event or another (Kambatla and et.al., 2014). This method is mostly useful in determining historical insights such as total stock in inventory, average customer visits per year or year-on-year growth in profit. Thus, by way of this method, the investigator is able to achieve valuable perception regarding the past and formulate various strategies which may impact future outcomes or goals of any organisation in a critical manner. Some of the most common components of this analytical method include the determination of sums, averages and percent changes among others.
- Predictive Data Analytics:
This method enables the investigator to derive meaning to raw data by employing statistical models as well as forecasting technique. For companies, such information helps in answering the question “What could happen?”. Additionally, businesses employing such analytical methodology are able to achieve actionable insights based on collected data. This is due to the fact that such information is mostly predictive in nature which helps in the ascertainment of estimates about the likelihood of a given future outcome. Through the utilisation of such a tool, the organisations are able to consider any unforeseen event which may have been neglected previously on the part of the management. Usually, this technique employs ERP, CRM, HR and POS System related data which has been stored in organisation's database over the years (Kankanhalli and et.al., 2016). Using this historical information, statistical tools and algorithms are executed which help in deriving meaningful relationships between variables taken into consideration. One of the most common application of this tool is to forecast demand for inputs or customer behaviour and purchasing patter or even identification of trends in turnover related activities.
- Prescriptive Data Analytics:
This method enables the investigator to derive meaning to a given set of accumulated raw data by employing optimization, business rules, machine learning, computational modelling procedures as well as simulation algorithms. Relatively, it is a new field of data analytics which has gained immense prominence in the recent years. For companies, such information helps in answering the question “What should we do?”. As the name suggests, this technique is largely related to the advisory side of analysis. Additionally, businesses employing such analytical methodology are able to achieve valuable learnings in regards to future outcomes. This is due to the fact that Predictive Data Analytics utilises a given set of information and enables a business manager to understand key impact of future decisions in a quantitative manner. Through the utilisation of such a tool, the organisations are not only able to predict but also get recommendations regarding a particular strategic business decision. Apart from this, this analysis also helps in showing the correct plan of action which must be taken by the business manager in order to achieve the recommended pathway (Kotu and Deshpande, 2014). One of the most common application of this tool among various business enterprises is to optimise different units constituting the supply chain. These may include scheduling, production and inventory.
1.3 Summarising the importance of data analytics for businesses
In current times, importance of data analytics cannot be regarded as something of trivial nature. This is due to the fact that organisations have become complex structures that require implementation of management policies or procedures by exercising proper control over all managerial functions. For this purpose, an enterprise may adopt the Exploratory Data Analysis (EDA) methodology that consists of a variety of techniques. One can say that it is a philosophy rather than a mere tool which helps in achievement of manner in which data analysis must be carried out by a particular business. Through its implementation, the businesses are able to achieve the following goals:
- Variable Identification;
- Detection and treatment of Errors and Omissions or Missing Values;
- Appropriate selection of analytical models suitable for a particular business;
- Determination of explanatory variables and their relationship with outcome variables.
Exploratory Data Analysis may be classified on the basis of graphical and non-graphical methods as well as univariate and bivariate analysis. This relationship has been depicted below:
(Source: Types of EDA, 2019)
The Graphical Methods employ the summarisation of data in a diagrammatic manner whereas under Non-Graphical the data is presented in the form of summary statistics (Manoochehri, 2013). On the other hand, Univariate Analysis includes scrutinizing a single variable at a time and vice versa. It is usually recommended to execute a Univariate EDA Analysis before performing a Bi-variate or Multi-variate investigation for a given set of variables.
In a nutshell, one can state that Exploratory Data Analysis helps in simplifying the presentation of numeric information in an attractive and easily communicative manner. This is mainly due to the fact that managers does not have time for looking at each and every variable or table. Through the implementation of EDA, in the context of current business scenario, organisations have been able to achieve faster and reliable communication of important information that aid in making informed strategic decisions in the enterprise. Thus, Data Analytics plays an important role in the success of an organisation.
2.1. Evaluate analytical model data preparation processes
As the name suggests, Data Preparation relates to those procedures or actions that help in prompt conversion of raw data into an easily understandable, accessible as well as analysable piece of information. It is the first step among different Data Analytics Projects undertaken by organisations all around the world. Hence, Data preparation process includes those activities which enable in the collection, cleaning as well as summarisation of data into a single fragment or piece of information which is mainly useful for analytical purposes (Michael and Miller, 2013).
There are many stages to this process which includes determination of data requirement along with collection, processing, discretization, reduction and aggregation of data in a meticulous manner.It is important to note that processing may include employing of semi-structured or unstructured metadata processing which involves provisioning of information regarding other data. On the other hand, discretization includes conversion of continuous data attributes into a finite set of intervals while preventing loss of information simultaneously. Once these are carried out, data is then reduced in a simplified and ordered form which is eventually aggregated to give a wholesome perspective regarding a particular situation in an effective manner.
For analytical purposes, the data may be presented in the form of graphical or non-graphical manner. For this purpose, employment of different types of Exploratory Data Analysis (EDA) is ideal for the organisation. One can say that through the application of EDA, Data Visualisation ecomes more effective yet simplified. Thus, acting as an important analytical tool that forms a crucial part of the overall Data Preparation Process. While implementing this analytical tool, the data preparation process would include utilisation of various methodologies such as Descriptive Statistics, Inferential Statistics, Statistical Graphics, Plots, Data Analysis, Infographics and Data Science. Through their employment, visualisation of inferences drawn as well as data prepared for the purpose of achieving such inferences is fulfilled in an effective and efficient manner.
2.2. Critically evaluating potential issues in the preparation of data for use in an applied analytical model
Preparation and presentation of data is not an easy task for someone. There are so many risks, hurdles and issues that can be faced by the data representative. The first issue that can occur is to find out the sources and users of particular information. Once the person is successful to find the user, it has to face a problem of data security and data governance (Miloslavskaya and Tolstoy, 2016). Sometimes data is so sensitive that is crucial to make it confidential and make it sure that values collected and presented in analytical model would not get leaked. After that the analyst have to manage the traceability of data so that it can not be hacked or track by unauthorised person. Data used for presenting in analytical model can be so wide and classified that it may be difficult for the person to memorise it.
3.1 Assessing methods to visualise the output from an applied analytical model
Visualization of Output drawn through application of Analytical Models such as Descriptive Statistics, Probability Distribution, Sampling and Estimation help in deriving critical statistical inferences that are essential for the organisations from strategic point of view. For instance, under Descriptive method, the output may be presented in the form of a summary statistic that talks about different measures such as central tendency, position and dispersion. On the other hand, probability distribution would depict the output in a tabular format that clearly states the frequencies in a cumulative, discrete or continuous manner (Russom, 2011). Lastly, sampling and estimation would visualise the output in a manner which shows interval or point estimate, systematic or random sampling among others. Thus, catering to the needs of the business manager in regards to different managerial functions such as operations, production and inventory management among others.
3.2 Application of an appropriate programming language or tool to demonstrate how descriptive analytic techniques contribute to decision-making
Decision-making Analysis includes models and algorithms which enhance the practice of drawing informed conclusions and plan of actions on the part of managers. All around the world companies utilise different programming languages or tools such as R or SPSS that provide valuable insights by utilising descriptive analysis techniques towards decision-making (Siemens and Baker, 2012). For instance, if Management of NHS wants to ascertain the psychological variables regarding depression and anxiety among individuals living across Australia, they may utilise SPSS to gain descriptions analytics. This would provide them the output in a summarised form indicated as under:
Through the above table, business manager would be able to ascertain that Maximum, Minimum, Average or mean, variances, standard deviation, skewness and kurtosis statistics regarding 1042 males and females who are above 65 years of age. Thus, enabling them to focus on creating awareness among such individuals in an effective manner by undertaking various campaigns for the same.
3.3 Applying an appropriate programming language or tool to demonstrate how predictive analytic techniques are used in forecasting future events
Predictive Analytical Tools such as Linear Regression, Multiple as well as Logistic Linear Regression are helpful in forecasting future events in a significant manner. Some of the techniques utilised for the same include undertaking qualitative, average, naïve approaches as well as time-series methods or cause-effect relationship modules. Through their implementation the company is able to achieve valuable insights regarding the future plan of action through the likelihood of outcomes that may occur in forthcoming years.
For instance, if a business such as Billabong wants to ascertain the relationship between two variables such as customer loyalty and purchasing pattern of customers in near future, they can employ predictive analytical tools such as Linear Regression. A visualisation of such output using R has been provided as under:
The above graphical representation indicates the scatter plot along with linear regression wherein one is able to see how close or loosely, the variables are dispersed around the line. Through the employment of this technique, the business manager of IBM would be able to see that not much impact of customer loyalty is based upon the purchasing activities of the client. Thus, enabling them to know how they can link purchasing patterns with customer loyalty in order to retain valuable clients in the long-run.
3.4 Employing an appropriate programming language or tool to demonstrate how prescriptive analytic techniques are used to find the best course of action in a situation
In recent times, companies have adopted prescriptive analytic techniques such as optimisation, classical optimisation, linear or non-linear and dynamic programming techniques. Even though they have not been completely embedded in the core business processing systems of organisations among the world, there are some businesses which have recognized its usefulness in relation to choosing the best course of action under a given situation.
For instance, Vizadata, MIT's Endor utilises query-builder to allow anyone to ask questions such as “which target market should our next store be located in?”, “which demographic is ideal for Product X?” (Vizdata, 019). Through query-building and optimisation they are able to achieve various answers regarding different answers. A sample model regarding this has been provided as under:
The above graph indicates how Vizdata has implemented Predictive Analysis technique so as to ascertain what size of store must be opened along with the fact that whether or not Store Layout must be given much emphasis or not.
From the above report it can be concluded that Data Analytics has become a critical part of organisations worldwide. Without it, they face various problems such as delayed decision as well as execution of such plans of actions on the part of the management. It is also important to note that various analytical techniques and programming languages such as Descriptive, Predictive as well as Prescriptive enable the users of data to take actions, forecast outcomes and ascertain various steps to choose from. In addition to this, data visualisation is also critical so as to communicate the information or output derived through the implementation of various analytical models in an effective manner.
Books and Journal:
- Andrejevic, M. and Gates, K., 2014. Big data surveillance: Introduction. Surveillance & Society. 12(2). pp.185-196.
- Chen, H., Chiang, R. H. and Storey, V. C., 2012. Business intelligence and analytics: From big data to big impact. MIS quarterly. 36(4).
- Gandomi, A. and Haider, M., 2015. Beyond the hype: Big data concepts, methods, and analytics. International journal of information management. 35(2). pp.137-144.
- Hazen, B. T. and et.al., 2014. Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. International Journal of Production Economics. 154. pp.72-80.
- Kambatla, K. and et.al., 2014. Trends in big data analytics. Journal of Parallel and Distributed Computing. 74(7). pp.2561-2573.
- Kankanhalli, A. and et.al., 2016. Big data and analytics in healthcare: introduction to the special section. Information Systems Frontiers. 18(2). pp.233-235.
- Kotu, V. and Deshpande, B., 2014. Predictive analytics and data mining: concepts and practice with rapidminer. Morgan Kaufmann.
- Manoochehri, M., 2013. Data just right: introduction to large-scale data & analytics. Addison-Wesley.
- Michael, K. and Miller, K. W., 2013. Big data: New opportunities and new challenges [guest editors' introduction]. Computer. 46(6). pp.22-24.
- Miloslavskaya, N. and Tolstoy, A., 2016. Big data, fast data and data lake concepts. Procedia Computer Science. 88. pp.300-305.
- Russom, P., 2011. Big data analytics. TDWI best practices report, fourth quarter. 19(4). pp.1-34.
- Siemens, G. and d Baker, R. S., 2012, April. Learning analytics and educational data mining: towards communication and collaboration. In Proceedings of the 2nd international conference on learning analytics and knowledge (pp. 252-254). ACM.