We have seen some technologies in big data now we will look at the major vendors that make up the Big Data market.
Major vendors in Hadoop
There are numbers of vendors in Hadoop Distribution.
- Hortonworks (HCatalog: Hive/Pig/MR Interop)
- MapR (Network File System replaces HDFS)
- IBM InfoSphere BigInsights
Hadoop and Cloud Computing
- Amazon distribution or MapR M3, M5
- It is very popular.
- It is standardized and relatively easy to use.
- Use SSH to query.
Hadoop on Azure
- Very simple provisioning
- Query in browser
- Query from excel, other tools via Hive ODBC driver.
Google Compute Engine
- MapR Platform as a Service
- Libraries for running cloud services
- Works on AWS EC2 and Rackspace
- For Cloudera, one command to build a cluster
- One more command to de-provision it
- Companies trying to make Hadoop work with enterprise storage instead of HDFS.
- MapR is the big one.
- Others are EMC,NetApp,Cleversafe,Symantec
NoSQL and NewSQL Vendors
- Cloud (Amazon web services, Dynamo DB, Simple DB, Windows Azure Tables, Cloudent)
- Just One DB
Massively Parallel Processing (MPP) products
- Vertica (HP Company)
- Netezza (IBM Company)
- Teradata Aster
Data Integration, Visualization and Analytics
Data Visualization and Analytics
All of the above
Business Intelligence (BI) Vendors
Now, we need to construct the questions that we need to consider, the decisions that we need to make, the pitfalls we need to avoid, and the roadmap that we can use to bring Big Data into our organizations.
You need to consider some point before implementing Big Data in your organization.
- Are you data volumes truly ‘Big’? Are you collecting enough data?
- Established technologies can handle lots of data.
- Big Data technology or more conventional data warehouse.
- Hadoop or MPP or Hybrid?
- NoSQL or NewSQL?
- On-premises or Cloud?
Hadoop Distribution Choices
- Major or Minor
- Cloudera or Hortonworks
- Cloud: Amazon.MapR or Microsoft
- MapR? Hadapt?
- Microsoft – Microsoft’s browser-based tooling
- Amazon – Elastic MapReduce
- Cloudera, others – Premium Distros
- Existing query clients – Via Hive, Via conventional DBs and Sqoop