Thursday, March 3, 2016

Structured Data vs Unstructured Data




source: www.sherpasoftware.com/blog/structured-and-unstructured-data-what-is-it/
Structured Data is data that can be organize and displayed in a table with columns and row which can be handled and refined by data visualization tools, we can say that it is almost always in the text file form. Unstructured data is data that can be proprietary to another entity and which may not have an identifiable internal structure. The content of the data can be assorted in clusters that are of no value until a process can identify and stored them in an organized fashion. For this process there exist specialized software that search items in the data and categorized them, to be able do have a “structured set” of the unstructured data.

There are two main ways to determine if a data set is structured or unstructured, in addition to the definition of each type, in example, the data explicitly has or dos not have a structured. The first one is that the data has some form of structure but has not been formally identify, but you can indirectly derived a form of identifying the structured, therefore this data should not be labeled as unstructured. Another form is that if the data is structured but you can’t derived an analysis out of it, then you can consider the data unstructured.

Unstructured data can be present in different places such as Emails, Word Processing Files, PDF files, Spreadsheets, Digital Images, Video, Audio, Social Media Posts and many other. This kind of “documents” or where you can find unstructured data have in their content what is called Rich Data. Rich data includes from pictures, video, voice, x-rays, power point presentations, gps locations, “check-ins”, people on a certain picture in a certain place etc…

While rich data types provide a remarkable opportunity to different edges on analysis over text alone, they do so at the expense of storage space. Rich media types are not just slightly larger that basic text, they can be orders of magnitude larger.


To help with the issue, associations have swung to various distinctive programming arrangements intended to look unstructured data and extract important data. The essential advantage of these instruments is the capacity to gather worthy data that can offer a business some assistance with succeeding in an aggressive and competitive environment. Since the volume of unstructured data is developing so quickly, numerous enterprises likewise swing to technological answers such as software and hardware that offers them some assistance with solutions to help them better manage and store  unstructured data. These can incorporate equipment or programming arrangements that empower them to make the most effective utilization of their accessible storage room. In the following figure you can see what tools are out there and why are the organizations are using it.


source: http://www.webopedia.com/TERM/U/unstructured_data.html


The use of Unstructured data  does not infer that Big data innovations, will supplant traditional data warehouses. Rather, they will exist in parallel. The traditional data warehouse will even now assume a fundamental part in the business. Financial analysis and different applications connected with the DW will still be important, and the DW itself will be a source of some of the data used in big data projects and will probably receive data from the results of advanced analysis projects. So Big data is a development as opposed to a substitution for DW and data engines. Therefore I think in the future the visualization of data in going to collect and analyze information from a much bigger spectrum and because of this become a tool to rich analysis.

Thursday, February 18, 2016

Helping on Decision Making through Dimensional Model, e.l.f Cosmetics


“We believe that every woman should be able to look like a million bucks without having to spend that much on premium quality makeup. “

e.l.f. Cosmetics is a company that provides premium beauty products at very affordable prizes. They have a variety of products like makeup, skin care products, beauty tools and many others. They have several ways to sell and market those products. First, they have the online shop, where customers can choose and order their products directly from the web page http://www.elfcosmetics.com/. Second, they have a “Beauty Box” that is sent to the customer every 8 weeks once they have pay a recurrent fee of 20$. Each bundle is guarantee to include at least 40$ worth of full size e.l.f. products. And third they have presence in stores like CVS, Walgreen and many other.

Customers highly recommend this products because of the quality and the price. In my experience my first encounter with the product was through Facebook “target marketing”. They have a strong presence in the ads that I get on Facebook. After that, I went to the web page and I immediately notice their prices. I did my first purchase through the online store, and of course, I signed up for the “Beauty Box”. After a couple of weeks I found their product in a CVS store, even in some Universities and collage’s Bookstores, but the variety was very limited.

I think that this company is having a very good response from the customers and having a vast options on how to get their product is very convenient for the customer. If I need something I can just go to the CVS store or just order it online. Also I can “try” their new products by signing up to their “Beauty Box” membership.


Helping the Company with a Data Warehouse and Dimensional Modeling

This company is a “young” company and it has been well accepted in the market. Still, decision making can be very important for the future of the company. Marketing strategy can be very crucial to know where they can gain more revenue.

A dimensional model and Data Warehouse can be very helpful for the company’s decision making. One clear example that I can think of would be the “outcome” of the Beauty Box membership.

Because is a membership, a customer would have to sign in and make an account on their webpage, therefore the company has basic demographic information about the customer, which can help to know “Who is buying our products?” The intention of this box is to get the customer to try other products that they would not normally buy because they don’t know about them or because they “think” they don’t need them. The important thing to keep in mind about this box is that the purpose is not to generate revenue from the membership, but mostly to generate revenue through the continuous purchase of the product that the customer tried from getting the box.

Because of this we can actually see various dimension, facts that we can explore and questions that can be answer through a Dimensional model.

We can think about the customer: we can answer questions about demographics, who are they? Who is more likely to continue to buy our products after trying them through our beauty box? We can also think about the product: What product category increase their sales when they are added to the Beauty Box. We can also think about time: How long it takes to individual customers to re-buy a product after they have received it through the box.  

Examples of answers that you can get from a dimensional model that can help the CEO on decision making can be:

-        When a lipstick is added to the Beauty Box it is likely that the sales on our online store of that particular product will increase 30% two weeks after the product was release to the customer. On the other hand a skin care product would likely take up to four weeks to generate 20% higher demand on our online store after it was release to the customer through the Beauty Box.

o   This is an answer that can be derived from a Dimensional model that takes, product category, store type, sales and Date as dimensions.

o   To create a Dimensional Model that answers this kind of questions it is important to get the different types of dimensional models, mainly transactions and periodic snapshots.

o   This can help the CEO when it comes to re-stoking and doing inventory, predictions like this can help have the product available for the demand that is getting.


This is only one simple example, focusing only in one of their ways of making revenue (the “Beauty Box”). A Dimensional Model can help in many other ways like: marketing, expansion on their presence in local stores, which products should be sale online and which products should be sale locally, how is the behavior of the market when a new product is release, etc…











Thursday, February 4, 2016

Business Intelligence & Analysis Products



Business Intelligence & Analysis Products Scan & Evaluation


I. The 5  BI & analysis products that are going to be compared are the following:



       


1. Tableau
  • Strength:
    • supports an incredibly vast array of data sources, and does so very efficiently
    • Tableau put a lot of effort into developing a robust mobile client.
    • It is a relatively low cost solution
    • integrates with most data types and offers out of the box integration with a variety of big data platforms
    • Integrates with R (the data Statistic Language)
  • Weaknesses:
    • if you want to connect to a database, a developer skilled in SQL will have to create the SQL query to pull the dataset.
    • There’s no way to recover previous versions- once you overwrite there’s no pulling back. (no versioning)
2. Qlik

  • Strength:
    • Qlikview is very User friendly which is easy to design, develop and deploy.
    • It comes with various tools to build the meaningful dashboards and reports that will meet need of the organization.
  • Weaknesses:
    • deployment could become expensive
3. SAS
  • Strength:
    • It delivers reporting, analysis, and interpretation of business data that is crucial to preserve and enhance the competitive edge of companies by optimizing processes and enabling them to react quickly to meet market opportunity (Source)
  • Weaknesses:
    • SAS Business Intelligence Suite (Analytics Pro) - This includes the base SAS server, SAS/STAT and SAS/GRAPH $8000
    • Additional Users - Cost per user $1710 (Source)
    • the size of data sets analyzed in SAS are generally bottlenecked by the size of the hard disk
4. SAP
  • Strength:
    • The product is scalable
    • ad-hoc information
  • Weaknesses:
    • Is expensive
5.  Microsoft
  • Strength:
    • Cost is one of the main resson customers choose Microsoft
    • Integrated with Excel
    • With the new stand alone version of Power BI Allows connectivity to on-premises SQL server Analysis Service
    • You can avoid moving or replicate data in the cloud while having a multidimensional data structure
    • Is going to support Apple and android devices and Cloud deployment

  • Weaknesses:
    • Product portfolio is to wide, it can confuse customers 
    • no drill-through capabilities in Power View
    • It is difficult to find external help for implementation




 II.  The criteria that this Product will be evaluated is the following:
  1. User Experience/ User Friendly
  2. Internal Platform Integration 
  3. Cloud Deployment
  4. Support/ Documentation
  5. Cost
  6. Mobile

III. Evaluation of the products:






IV. Criteria Discussion:

  1. User Experience
    • using the definition of the Gartner's Quadrant I uses this Criteria as the following: "The ability to create highly interactive dashboards and content with visual exploration and embedded advanced and geospatial analytics to be consumed by others." (Source)
     2. Integration
    • It is important to know how easy it is to integrate a product to the current tools uses in a company. For example a grand number of companies use SQL databases but a lot of BI product are very easy to use with excel sheet but it is a little more coplicated when it come to integrated with SQL. ( integrating it with SQL can avoid data replication)
     3.  Cloud Deployment
    • It is important to use cloud resources if the local resources are not available in the company, buying local storage (servers) can become very expensive.
    4.  Documentation
    • It Is important to be provided various tools and service to the users of this product so that they can feel comfortable  with their investment.
    5.  Mobile
    • Mobile is a important criteria because it can take advantage of the characteristics of a native application in a mobile device and expands the access of the information.
     6. Cost
    • It is a factor that has to be consider because it an investment for the company, not only the product's cost but the implementation costs.