ISBN-13: 9781484256527 / Angielski / Miękka / 2020 / 237 str.
ISBN-13: 9781484256527 / Angielski / Miękka / 2020 / 237 str.
Intermediate user level
Introduction (10 pages)
“Without a systematic way to start and keep data clean, bad data will happen.” – Donato Diorio, CEO of RingLead
In this introduction, I talk about what data integration is and what it is not. I provide a working definition and how, in our text, I use migration and integration interchangeably. I describe why data integration matters, under what use cases it is most successful, and what can happen when integrations fail. I caution that readers need to be familiar with basic databases practices (how they work in collaboration with departments) and have a basic understanding of workflows and process handoffs. I also recommend that readers have some experience with business templates such as Requirements Documents and Sequence Diagrams.
Chapter 1: Integration Background (30 pages)
I start with a brief history of data migration, from its earliest days to the more modern times of cloud migration. Through this, readers will be able to grasp the advancement in architecture, speed, and complexity. We then discuss how integration is more of a process rather than a product and “owning a process” requires a different perspective that the more familiar “owning a product” that software development entails. We end the chapter with a discussion of integration approaches ranging from the one-time migration, nightly integration, listener services and hybrid approach.
1) Brief history
2) Process Ownership vs. Product Ownership
3) Integration Approachesa. One-time Migration
b. Nightly Integration
c. Listener services
d. Scheduled services
e. Hybrid approach
Chapter 2: Key Terms (20 pages)
Integration developed from a combination of both technical and business mindsets and consequently has several terms that require some understanding. This section is designed to clarify what these terms mean and why they matter in the integration process.
Some of the terms to be discussed include
1) Metadata
2) Source
3) Target
4) Mapping
5) Extractions, Transformation, and Loading (ETL)
Chapter 3: Team Qualifications (20 pages)
The integration team consists of individuals who are technical engineers, business-focused analysts, great communicators, and experienced coders. While some teams are quite small (I’ve seen those with two people or less), others can be much larger (a typical size of 5-9 could be the norm). This chapter discusses the roles and responsibilities required to create a great integration team.
1) Data Integration Architect
2) Developer
3) Release Manager
4) Project Manager5) Stakeholders
Chapter 4: Finding your Purpose: Project Deliverables (30 pages)
With a team structure firmly in mind, the next step is to determine what type of functional, business, and technical requirements should be captured and documented. Data integrations tend to be very fluid, and often, mappings can change multiple times before the business provides final signoff. If large datasets reveal inexplicable errors, developers must quickly research, code, and communicate workarounds to stakeholders. The best way to do this is through a combination of good business requirements, an understanding of who can support change requests, and the right documents to communicate the integration approach.
1) The purpose of Business Requirements
2) A good communication plan: Knowing your Points of Contact
3) Primary documents
a. High-Level Design Document
b. Source to Target Mapping
c. Sequence Diagram
d. Architecture Diagram
e. ETL Screenshots
Chapter 5: ETL (35 pages)
Depending on the type of integration required, developers and architects need to have experience with database services, cloud platforms, XML, and underlying communication protocols like REST and SOAP. Tying these technologies together requires the right ETL tool. This chapter discusses how ETL typically operates and includes a sample script to demonstrate the steps to build a simple service. We then go through the current ETL software leaders in the marketplace ranging from the least amount of experience (but with little flexibility) to the more advanced (but with plenty of bells and whistles). The key here is to help establish the right tool for the right job without having to learn a series of new technologies overnight.
1) An introduction to ETL
a. Extraction (Connectors, Part 1)
b. Transformation
c. Load (Connectors, Part 2)
2) Sample exercise
3) Popular Software
a. Jitterbitb. Talend
c. Dell Boomi
d. Pentaho
e. SQL Server (SSIS)
f. MuleSoft
g. Informatica
h. Scripting languages (Python, Perl)
Chapter 6: Platform Automation (20 pages)
One of the hallmarks of a good integration design is that it should be repeatable. The initial load scripts should be tested rigourously on Develoment and Test environments, and daily processes should run continuously on Production Environments, changing where needed to fit the growing needs of the business. Much of this approach relates directly to a DevOps model and bears discussing as it relates to data integration.
1) Environment Builds
2) Running your pipeline through orchestrations
3) File Version Control
4) Testing
a. Back-End
b. Front-End
5) Deploying through release management
Chapter 7: Monitoring Results (25 pages)
The design is complete, the business is satisfied with the requirements, and the integration has gone live. The only thing left to do is to start monitoring the results of the integration. In this chapter, we implement a PDCA (Plan-Do-Check-Act cycle) to improve the integration output, providing daily success and error counts to users through emails and other notification channels. We discuss ways to identify Type I and Type II errors and making sure the owners of the data systems know how to resolve their issues once discovered. We end with a brief discussion on using the integration as a feeder into business intelligence, potentially using predictive analytics to discover gaps in the data that can lead to additional integration projects.
1) The Continous Improvement Model
2) Identifying Successes and Errors
3) Alerting Teams Through Different Notification Channels
4) Using Analytics to generate future integrationsChapter 8: Building Outward: Marketing to the Enterprise (20 pages)
By this point, the integration team should have deployed at least a few projects and have earned accolades for their successful accomplishments. This celebration is no time to rest on their laurels, however. Before the excitement dies down, teams should reach out to departments across the enterprise, identifying future projects that make the best candidates for data integration. Part of this approach will involve advertising to make external teams aware of the work the integration team has done, performing educational activities such as Lunch-and-Learns, and become involved with building an Integration Center of Excellence (I will discuss this in some length). Understanding the goals of the enterprise for the upcoming financial year would also have value; with some research and creativity, the team can construct an integration data roadmap. This visual summary, similar to a product roadmap, maps out the vision and direction of integration offerings, which departments they serve, and the estimated time it would take to complete.
1) Advertising
2) Education
3) Building an Integration Center of Excellence
4) Data Roadmaps
Appendix A: Sample Templates
There will be some follow-up in the Appendix including the code and sample templates (which are also stored in GitHub). I’d also like to include some additional weblinks to the ETL software (we might want to move that section over to the Appendix if it doesn’t fit with the rest of the book).
Jarrett Goldfedder is the founder of InfoThoughts Data, LLC, a company that specializes in data management, migration, and automation. He has significant experience in both cloud-based and on-premise technologies and holds various certificates in Salesforce Administration, Dell Boomi Architecture, and Informatica Cloud Data. He also served as a technical reviewer of the Apress book by David Masri titled Developing Data Migrations and Integrations with Salesforce: Patterns and Best Practices.
Find the right people with the right skills. This book clarifies best practices for creating high-functioning data integration teams, enabling you to understand the skills and requirements, documents, and solutions for planning, designing, and monitoring both one-time migration and daily integration systems.
Data migrations and integrations can be complicated. In many cases, project teams save the actual migration for the last weekend of the project, and any issues can lead to missed deadlines or, at worst, corrupted data that needs to be reconciled post-deployment. This book details how to plan strategically to avoid these last-minute risks as well as how to build the right solutions for future integration projects.
You will:
1997-2024 DolnySlask.com Agencja Internetowa