Pro Hadoop Data Analytics: Designing and Building Big Data Systems Using the Hadoop Ecosystem » książka

zaloguj się | załóż konto

topmenu

Szukaj

Książki na zamówienie

Wyszukiwanie zaawansowane

Pusty koszyk

Bezpłatna dostawa dla zamówień powyżej 40 zł

Kategorie główne

• Nauka

[2950464]

• Literatura piękna

[1818042]

więcej...

Kategorie szczegółowe BISAC

Pro Hadoop Data Analytics: Designing and Building Big Data Systems Using the Hadoop Ecosystem

ISBN-13: 9781484219096 / Angielski / Miękka / 2016 / 298 str.

Kerry Koitzsch

Pro Hadoop Data Analytics: Designing and Building Big Data Systems Using the Hadoop Ecosystem

ISBN-13: 9781484219096 / Angielski / Miękka / 2016 / 298 str.

Kerry Koitzsch

cena 160,99
(netto: 153,32 VAT: 5%)

Najniższa cena z 30 dni: 154,18

Termin realizacji zamówienia:
ok. 16-18 dni roboczych.

Darmowa dostawa!

Learn advanced analytical techniques and leverage existing tool kits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems that go beyond the basics of classification, clustering, and recommendation.Pro Hadoop Data Analytics emphasizes best practices to ensure coherent, efficient development. A complete example system will be developed using standard third-party components that consist of the tool kits, libraries, visualization and reporting code, as well as support glue to provide a working and extensible end-to-end system.The book also highlights the importance of end-to-end, flexible, configurable, high-performance data pipeline systems with analytical components as well as appropriate visualization results. You'll discover the importance of mix-and-match or hybrid systems, using different analytical components in one application. This hybrid approach will be prominent in the examples.What You'll Learn

Build big data analytic systems with the Hadoop ecosystem
Use libraries, tool kits, and algorithms to make development easier and more effective
Apply metrics to measure performance and efficiency of components and systems
Connect to standard relational databases, noSQL data sources, and more
Follow case studies with example components to create your own systems

Who This Book Is For
Software engineers, architects, and data scientists with an interest in the design and implementation of big data analytical systems using Hadoop, the Hadoop ecosystem, and other associated technologies.

Kategorie:

Informatyka, Internet

Kategorie BISAC:

Computers > Programming - Object Oriented
Computers > Languages - General
Computers > Data Science - Data Analytics

Wydawca:

Apress

Język:

Angielski

ISBN-13:

9781484219096

Rok wydania:

2016

Ilość stron:

298

Waga:

0.56 kg

Wymiary:

25.4 x 17.78 x 1.73

Oprawa:

Miękka

Wolumenów:

Dodatkowe informacje:

Bibliografia
Wydanie ilustrowane

[PART I: CONCEPTS]

Chapter 1: Overview: Building Data Analytic Systems with Hadoop

In this chapter we discuss what analytic systems using Hadoop are, why they are important, data sources which may be used, and applications which are --- and are not suitable for a distributed system approach using Hadoop.

Subtopics:

1. Introduction: The Need for Distributed Analysis

2. How the Hadoop Ecosystem Implements Big Data Analysis

3. A Survey of the Hadoop Ecosystem

4. Architectures for Building

5. Summary

Chapter 2: Programming Languages: A Scala and Python Refresher

This chapter consists of a concise overview of the Scala and Python programming languages, and details why these languages are important ingredients of most modern Hadoop analytical systems. The chapter is primarily aimed at Java/C++ programmers who need a quick review/introduction to the Scala and Python programming languages.<

Subtopics:

1. Motivation: Selecting the Right Language(s) Defines the Application

1. Review of Scala

2. Review of Python

3. Programming Applications and Examples

4. Summary

Chapter 3: Necessary Ingredients: Standard Toolkits for Hadoop and Analytics

In this chapter we describe an example system which we develop throughout the remainder of the book using standard toolkits from the Hadoop ecosystem,

and other analytical toolkits in combination with development components such as Maven, openCV, Apache Mahout, and others to create a Hadoop-based system appropriate for a variety of applications.

Subtopics:

1. Libraries, Components, and Toolkits: A Survey

2. Numerical and Statistical Libraries; R, Weka, and Others

3. Hadoop Toolkits for Analysis: Mahout and Friends

4. Apache Spark Libraries and Components: H20, Sparkling Water, and More

5. Examples of Use and System Building

6. Summary

Chapter 4: Relational, noSQL, and Graph Databases

In this chapter we describe relational databases, such as mysql, noSQL databases such as Cassandra, and graph databases such as neo4j, how to integrate them with the Hadoop ecosystem, and how to create customized data sources and sinks using Apache Camel.

Subtopics:

1. Introduction to Databases: Relational, NoSQL, and Graph

2. Relational Data Sources

3. noSQL Data Sources: Cassandra

4. Gra

ph Databases: Neo4j

5. Integrating Data with the Analytical Engine

6. Summary

Chapter 5: Data Pipelines and How to Construct Them

In this chapter we describe how to construct basic data pipelines using data sources and the Hadoop ecosystem. We provide an end-to-end example of how data sources may be linked and processed using Hadoop and other analytical components, and how this is s

imilar to a standard ETL process.

Subtopics:

1. The Basic Data Pipeline

2. Data Sources and Sinks

3. Computation and Transformation

4. Visualizing and Reporting the Results

5. Summary

Chapter 6: Advanced Search Techniques with Hadoop, Lucene, and Solr

In this chapter we describe the structure and use of the Lucene and Solr third-party search engine components, how to use them with Hadoop, and how to develop advanced search capability customized for an analytical application.

Subtopics:

1. Introduction to Customized Search Engines

2. Distributed Search Techniques

3. Basic Examples: A Custom Search Component

4. Extended Examples: Scaling,

Tuning, and Customizing the Search Component

5. Summary

[ PART II: ARCHITECTURES AND ALGORITHMS]

Chapter 7: An Overview of Analytical Techniques and Algorithms

In this chapter, we provide an overview of four categories of algorithm: statistical, Bayesian, ontology-driven, and hybrid algorithms which leverage the more basic algorithms found in standard libraries to perform more in-depth and accurate analyses using Hadoop.

Subtopics:

1. Survey of Algorithm Types

2. Statistical / Numerical Techniques

3. Bayesian Techniques

4. Ontology Driven Algorithms

5. Hybrid Algorithms: Combining Algorithm Types

6. Code Ex

amples

7. Summary

Chapter 8: Rule Engines, System Control, and System Orchestration

In this chapter, we describe the Drools rule engine and how it may be used to control and orchestrate Hadoop analysis pipelines. We describe an example rule-based controller which can be used for a variety of data types and applications in combination with the Hadoop ecosystem.

Subtopics:

1. Introduction to Rule Systems: Drools

2. Rule-Based Software System C

ontrol3. System Orchestration with Drools

4. Analytical Engine Example with Rule Control

5. Summary

Chapter 9: Putting it All Together: Designing a Complete Analytical System

In this chapter, we describe an end-to-end design example, using many of the components discussed so far, as well as ‘best practices’ to use during the requirements acquisition, planning, architecting, development, and test phases of the system development project.

Subtopics:

1. Goals and Requirements for Analytical System Building

2. Architecture

3. Initial Code Framework Example

4. Extended Code Framework Example

5. Summary

[PART III: COMPONENTS AND SYSTEMS]

Chapter 10: Using Library Components for Statistical Analytics and Data Mining

In this chapter, we describe four standard statistical analysis packages: R/Weka, MLib, Mahout, and Numpy Extended. These toolkits ar

e used to develop a data mining example using a Hadoop cluster and a variety of the Hadoop ecosystem components to provide a dashboard-based result report.

Subtopics:

1. A Survey of Data Mining Techniques and Applications

2. R/Weka Example

3. Numpy Extended Example

4. Integration with Hadoop Analytical Components

5. Data Mining Example

6. Summary

Chapter 11: Semantic Web Technologies and Natural Language Processing

In this chapter, we describe the use of knowledge information sources such as taxonomies, ontologies, and grammars, why they are useful, and how to integrate them with Hadoop analytical components as well as with natural language processing components to provide an added layer of ease-of-use to an analytical system.

Subtopics:

1. Introduction to Semantic Web Technologies

2. Semantic Web For Hadoop (Examples)

3. Data Integration with Semantic Web Technologies

4. Code Examples with Data Integration using Apache Camel

5. Extended Example

6. Summary

Chapter 12: Machine Learning Components with Hadoop

In this chapter, we discuss a number of machine learning components including neural net, genetic algorithm, Markov modeling, and hybrid components, and how they may be used with the Hadoop ecosystem to provide cognitive computing elements to an analytical engine.

Subtopics:

1. Introduction: The Need for Machine Learning

2. Machine Learning Toolkits and Hadoop

3. Code Examples using Apache Mahout

4. Extended Code Examples

5. Neural Nets, Genetic Algorithms, and Hybrids

6. Summary

Chapter 13: Data Visualizers: Seeing and Interacting with the Analysis

In this chapter, we discuss how to create data visualization components, connect them with the analytical modules of the system, and how to provide the user with the ability to interact with the charts, dashboards, and reports.

Subtopics:

1. Introduction to Data Visualization : The Need to See Results

2. Visualizers for Simple Data: Some Examples

3. Data Visualizers and Hadoop: Some Examples

4. Visualizers for more than Two Dimensions (three-D examples and extended plots/charting)

5. Summary: Future Directions for Data Visualization

[PART IV: CASE STUDIES AND APPLICATIONS]

Chapter 14: A Case Study in Bioinformatics: Analyzing Microscope Slide Data

In this chapter, we describe an application to analyze microscopic slide data such as might be found in medical examinations of patient samples. We illustrate how a Hadoop system might be used on a small Hadoop cluster to organize, analyze, and correlate bioinformatic data.

Subtopics:

1. Introduction to Bioinformatics

2. Analyzing Microscope Slide Data Automatically

3. Basic Examples

4. Extended Examples

5. Summary

Chapter 15: A Bayesian Analysis Software Component: Identifying Credit Card Fraud

In this chapter, we describe a Bayesian analysis component plugin which may be used to analyze credit card transactions in order to identify fraudulent use of the credit card by illicit users.

Subtopics:

1. Introduction to Bayesian Analysis

2. The Problem of Credit Fraud and Possible Solutions

3. Basic Applications of the Data Models

4. Examples of Fraud Detection

5. Summary

Chapter 16: Searching for Oil: Geological Data Analysis with Mahout

In this chapter, we describe a system which uses geospatial data, ontologies, and other semantic web information to predict where geological resources, such as oil or bauxite (aluminum ore) might be found.

Subtopics:

1. Introduction to the Geospatial Data Arena

2. Components and Architecture^3. Data Sources for Geospatial Data

4. Basic Examples and Visualizations

5. Extended Examples

6. Summary

Chapter 17: ‘Image as Big Data’ Systems: Some Case Studies

In this chapter, we describe the use of ‘images as big data’ and how image data may be used in combination with the Hadoop ecosystem to provide information for a variety of systems.

Subtopics:

1. Introduction to the Image as Big Data Concept

2. Components and Architecture

3. Data Sources for Imagery and How to Use Them

4. The Image as Big Data Pipeline

5. Examples

6. Summary

Chapter 18: A Generic Data Pipeline Analytical System

In this chapter, we detail and end-to-end analytical system using many of the techniques we discussed throughout the book to provide an evaluation system the user may extend and edit to create her own Hadoop data analysis system.

Subtopics:1. Architecture and Description of Example System

2. How to obtain and run the system

3. Basic examples

4. Extended Examples

5. How to extend the system for custom applications

6. Summary

Chapter 19: Conclusions and The Future of Big Data Analysis

In this chapter we sum up what we have learned in the previous chapters and discuss some of the developing trends in big data analysis including ‘incubator’ projects and ‘young’ projects for data analysis, and we speculate on what the future holds for big data analysis and the Hadoop ecosystem (it can only continue to grow)

Subtopics:

1. Conclusions: The Current state of Hadoop Data Analytics

<2. Future Hadoop Analysis: Speculations

Kerry Koitzsch is a software engineer and interested in the early history of science, particularly chemistry. He frequently publishes papers and attends conferences on scientific and historical topics, including early chemistry and alchemy, and sociology of science. He has presented many lectures, talks, and demonstrations on a variety of subjects for the United States Army, the Society for Utopian Studies, American Association for Artificial Intelligence (AAAI), Association for Studies in Esotericism (ASE), and others. He has also published several papers and written two historical books.

Kerry was educated at Interlochen Arts Academy, MIT, and the San Francisco Conservatory of Music. He served in the United States Army and United States Army Reserve, and is the recipient of the United States Army Achievement Medal. He has been a software engineer specializing in computer vision, machine learning, and database technologies for 30 years, and currently lives and works in Sunnyvale, California.

In Pro Hadoop Data Analytics best practices are emphasized to ensure coherent, efficient development. A complete example system will be developed using standard third-party components which will consist of the toolkits, libraries, visualization and reporting code, as well as support glue to provide a working and extensible end-to-end system.

The book emphasizes four important topics:

The importance of end-to-end, flexible, configurable, high-performance data pipeline systems with analytical components as well as appropriate visualization results. Deep-dive topics will include Spark, H20, Vopal Wabbit (NLP), Stanford NLP, and other appropriate toolkits and plugins.
The importance of mix-and-match or hybrid systems, using different analytical components in one application to accomplish application goals. The hybrid approach will be prominent in the examples.
Use of existing third-party libraries is key to effective development. Deep dive examples of the functionality of some of these toolkits will be showcased as you develop the example system.

Krainaksiazek.pl w programie rzetelna firma

Krainaksiaze.pl - płatności przez paypal

Czytaj nas na:

Zobacz:

1997-2026 DolnySlask.com Agencja Internetowa