1. Decide the context of using Bigdata Analysis
2. Recognize the features and benefits of SQL dialects designed to work with big data systems for storage and analysis
3. Access and Process Data on Distributed File System
4. Manage Job Execution in Hadoop Environment
5. Develop Big Data Solutions using Hadoop Eco System
6. Realistically assess the application of big data analytics technologies for different usage scenarios and start with their own experiments.
Bigdata analytics course provides students with a comprehensive understanding of the tools, techniques, and underlying principles required to analyze and extract valuable insights from large and complex data sets. The course is designed to equip students with both theoretical knowledge and practical skills necessary for a career in data science and analytics. This Course comprised of:
1. Introduction to Bigdata Analysis, Nature of Bigdata, Challenges, Graph Data Analysis, Introduction to Streams concept, Stream Data Model and Architecture, Sampling Data in a Stream, Filtering Streams, Real Time Analytics, Case Study-Real Time Sentiment Analysis.
2. Types of Analytics: Prescriptive Analytics, Customer Analytics, Descriptive Analytics, Data Collection, Media Planning, Causal Data, Regression Analysis-the Demand Curve and Making Predictions, Probability Models, Applications: ROI.
3. Databases for Bigdata Analytics: SQL Essentials, Filtering Data, Distinction between operational and analytic databases, Limitations of Traditional RDBMSs, SQL for Structures, Semi Structured and Unstructured Data, Big Data Analytic Databases, NoSQL: Operational, Unstructured and Semi-structured, Non-transactional, Structured Systems, Big Data ACIDCompliant RDBMSs, SQL Tools for Big Data Analysis.
4. Hadoop and Map Reduce: The Design of HDFS, HDFS Concepts, Command Line Interface, Hadoop file system interfaces, Data flow, Data Ingest with Flume and Scoop and Hadoop archives, Hadoop I/O: Compression, Serialization, Avro and File-Based Data structures Anatomy of a Map Reduce Job Run, Failures, Job Scheduling, Shuffle and Sort, Task Execution, Map Reduce Types and Formats, Map Reduce, Features. Tools for Bigdata Analysis: Apache Hive, Business Use Cases: Solution with Hive, Hive DDL and Hive DML, Hive Analytics: UDF, UDAF, UDTF, Hive Streaming, Apache Impala, Exploring Structured Data in Hue, , Spark SQL.
- Teacher: MOHD REHAN GHAZI