ISBN-13: 9781119716747 / Angielski / Twarda / 2022 / 592 str.
ISBN-13: 9781119716747 / Angielski / Twarda / 2022 / 592 str.
Preface xv1 Introduction 11.1 Data Science: Statistics, Probability, Calculus ... Python (or Perl) and Linux 21.2 Informatics and Data Analytics 31.3 FSA-Based Signal Acquisition and Bioinformatics 41.4 Feature Extraction and Language Analytics 71.5 Feature Extraction and Gene Structure Identification 81.5.1 HMMs for Analysis of Information Encoding Molecules 111.5.2 HMMs for Cheminformatics and Generic Signal Analysis 111.6 Theoretical Foundations for Learning 131.7 Classification and Clustering 131.8 Search 141.9 Stochastic Sequential Analysis (SSA) Protocol (Deep Learning Without NNs) 151.9.1 Stochastic Carrier Wave (SCW) Analysis - Nanoscope Signal Analysis 181.9.2 Nanoscope Cheminformatics - A Case Study for Device "Smartening" 191.10 Deep Learning using Neural Nets 201.11 Mathematical Specifics and Computational Implementations 212 Probabilistic Reasoning and Bioinformatics 232.1 Python Shell Scripting 232.1.1 Sample Size Complications 332.2 Counting, the Enumeration Problem, and Statistics 342.3 From Counts to Frequencies to Probabilities 352.4 Identifying Emergent/Convergent Statistics and Anomalous Statistics 352.5 Statistics, Conditional Probability, and Bayes' Rule 372.5.1 The Calculus of Conditional Probabilities: The Cox Derivation 372.5.2 Bayes' Rule 382.5.3 Estimation Based on Maximal Conditional Probabilities 382.6 Emergent Distributions and Series 392.6.1 The Law of Large Numbers (LLN) 392.6.2 Distributions 392.6.3 Series 422.7 Exercises 423 Information Entropy and Statistical Measures 473.1 Shannon Entropy, Relative Entropy, Maxent, Mutual Information 483.1.1 The Khinchin Derivation 493.1.2 Maximum Entropy Principle 493.1.3 Relative Entropy and Its Uniqueness 513.1.4 Mutual Information 513.1.5 Information Measures Recap 523.2 Codon Discovery from Mutual Information Anomaly 583.3 ORF Discovery from Long-Tail Distribution Anomaly 663.3.1 Ab initio Learning with smORF's, Holistic Modeling, and Bootstrap Learning 693.4 Sequential Processes and Markov Models 723.4.1 Markov Chains 733.5 Exercises 754 Ad Hoc, Ab Initio, and Bootstrap Signal Acquisition Methods 774.1 Signal Acquisition, or Scanning, at Linear Order Time-Complexity 774.2 Genome Analytics: The Gene-Finder 804.3 Objective Performance Evaluation: Sensitivity and Specificity 934.4 Signal Analytics: The Time-Domain Finite State Automaton (tFSA) 934.4.1 tFSA Spike Detector 954.4.2 tFSA-Based Channel Signal Acquisition Methods with Stable Baseline 984.4.3 tFSA-Based Channel Signal Acquisition Methods Without Stable Baseline 1034.5 Signal Statistics (Fast): Mean, Variance, and Boxcar Filter 1074.5.1 Efficient Implementations for Statistical Tools (O(L)) 1094.6 Signal Spectrum: Nyquist Criterion, Gabor Limit, Power Spectrum 1104.6.1 Nyquist Sampling Theorem 1104.6.2 Fourier Transforms, and Other Classic Transforms 1104.6.3 Power Spectral Density 1114.6.4 Power-Spectrum-Based Feature Extraction 1114.6.5 Cross-Power Spectral Density 1124.6.6 AM/FM/PM Communications Protocol 1124.7 Exercises 1125 Text Analytics 1255.1 Words 1255.1.1 Text Acquisition: Text Scraping and Associative Memory 1255.1.2 Word Frequency Analysis: Machiavelli's Polysemy on Fortuna and Virtu 1305.1.3 Word Frequency Analysis: Coleridge's Hidden Polysemy on Logos 1395.1.4 Sentiment Analysis 1435.2 Phrases - Short (Three Words) 1455.2.1 Shakespearean Insult Generation - Phrase Generation 1475.3 Phrases - Long (A Line or Sentence) 1505.3.1 Iambic Phrase Analysis: Shakespeare 1505.3.2 Natural Language Processing 1525.3.3 Sentence and Story Generation: Tarot 1525.4 Exercises 1536 Analysis of Sequential Data Using HMMs 1556.1 Hidden Markov Models (HMMs) 1556.1.1 Background and Role in Stochastic Sequential Analysis (SSA) 1556.1.2 When to Use a Hidden Markov Model (HMM)? 1606.1.3 Hidden Markov Models (HMMs) - Standard Formulation and Terms 1616.2 Graphical Models for Markov Models and Hidden Markov Models 1626.2.1 Hidden Markov Models 1626.2.2 Viterbi Path 1636.2.3 Forward and Backward Probabilities 1646.2.4 HMM: Maximum Likelihood discrimination 1656.2.5 Expectation/Maximization (Baum-Welch) 1666.3 Standard HMM Weaknesses and their GHMM Fixes 1686.4 Generalized HMMs (GHMMs - "Gems"): Minor Viterbi Variants 1716.4.1 The Generic HMM 1716.4.2 pMM/SVM 1716.4.3 EM and Feature Extraction via EVA Projection 1726.4.4 Feature Extraction via Data Absorption (a.k.a. Emission Inversion) 1746.4.5 Modified AdaBoost for Feature Selection and Data Fusion 1766.5 HMM Implementation for Viterbi (in C and Perl) 1796.6 Exercises 2067 Generalized HMMs (GHMMs): Major Viterbi Variants 2077.1 GHMMs: Maximal Clique for Viterbi and Baum-Welch 2077.2 GHMMs: Full Duration Model 2167.2.1 HMM with Duration (HMMD) 2167.2.2 Hidden Semi-Markov Models (HSMM) with sid-information 2207.2.3 HMM with Binned Duration (HMMBD) 2247.3 GHMMs: Linear Memory Baum-Welch Algorithm 2287.4 GHMMs: Distributable Viterbi and Baum-Welch Algorithms 2307.4.1 Distributed HMM processing via "Viterbi-overlap-chunking" with GPU speedup 2307.4.2 Relative Entropy and Viterbi Scoring 2317.5 Martingales and the Feasibility of Statistical Learning (further details in Appendix) 2327.6 Exercises 2348 Neuromanifolds and the Uniqueness of Relative Entropy 2358.1 Overview 2358.2 Review of Differential Geometry 2368.2.1 Differential Topology - Natural Manifold 2368.2.2 Differential Geometry - Natural Geometric Structures 2408.3 Amari's Dually Flat Formulation 2438.3.1 Generalization of Pythagorean Theorem 2468.3.2 Projection Theorem and Relation Between Divergence and Link Formalism 2468.4 Neuromanifolds 2478.5 Exercises 2509 Neural Net Learning and Loss Bounds Analysis 2539.1 Brief Introduction to Neural Nets (NNs) 2549.1.1 Single Neuron Discriminator 2549.1.2 Neural Net with Back-Propagation 2589.2 Variational Learning Formalism and Use in Loss Bounds Analysis 2619.2.1 Variational Basis for Update Rule 2619.2.2 Review and Generalization of GD Loss Bounds Analysis 2629.2.3 Review of the EG Loss Bounds Analysis 2669.3 The "sinh¯.1(omega)" link algorithm (SA) 2669.3.1 Motivation for "sinh¯.1(omega)" link algorithm (SA) 2669.3.2 Relation of sinh Link Algorithm to the Binary Exponentiated Gradient Algorithm 2689.4 The Loss Bounds Analysis for sinh¯.1(omega) 2699.4.1 Loss Bounds Analysis Using the Taylor Series Approach 2739.4.2 Loss Bounds Analysis Using Taylor Series for the sinh Link (SA) Algorithm 2759.5 Exercises 27710 Classification and Clustering 27910.1 The SVM Classifier - An Overview 28110.2 Introduction to Classification and Clustering 28210.2.1 Sum of Squared Error (SSE) Scoring 28610.2.2 K-Means Clustering (Unsupervised Learning) 28610.2.3 k-Nearest Neighbors Classification (Supervised Learning) 29210.2.4 The Perceptron Recap (See Chapter 9 for Details) 29510.3 Lagrangian Optimization and Structural Risk Minimization (SRM) 29610.3.1 Decision Boundary and SRM Construction Using Lagrangian 29610.3.2 The Theory of Classification 30110.3.3 The Mathematics of the Feasibility of Learning 30310.3.4 Lagrangian Optimization 30610.3.5 The Support Vector Machine (SVM) - Lagrangian with SRM 30810.3.6 Kernel Construction Using Polarization 31010.3.7 SVM Binary Classifier Derivation 31210.4 SVM Binary Classifier Implementation 31810.4.1 Sequential Minimal Optimization (SMO) 31810.4.2 Alpha-Selection Variants 32010.4.3 Chunking on Large Datasets: O(N²) -> n O(N²/n²) = O(N²)/n 32010.4.4 Support Vector Reduction (SVR) 33110.4.5 Code Examples (in OO Perl) 33510.5 Kernel Selection and Tuning Metaheuristics 34610.5.1 The "Stability" Kernels 34610.5.2 Derivation of "Stability" Kernels 34910.5.3 Entropic and Gaussian Kernels Relate to Unique, Minimally Structured, Information Divergence and Geometric Distance Measures 35110.5.4 Automated Kernel Selection and Tuning 35310.6 SVM Multiclass from Decision Tree with SVM Binary Classifiers 35610.7 SVM Multiclass Classifier Derivation (Multiple Decision Surface) 35910.7.1 Decomposition Method to Solve the Dual 36110.7.2 SVM Speedup via Differentiating BSVs and SVs 36210.8 SVM Clustering 36410.8.1 SVM-External Clustering 36510.8.2 Single-Convergence SVM-Clustering: Comparative Analysis 36810.8.3 Stabilized, Single-Convergence Initialized, SVM-External Clustering 37510.8.4 Stabilized, Multiple-Convergence, SVM-External Clustering 37910.8.5 SVM-External Clustering - Algorithmic Variants 38110.9 Exercises 38511 Search Metaheuristics 38911.1 Trajectory-Based Search Metaheuristics 38911.1.1 Optimal-Fitness Configuration Trajectories - Fitness Function Known and Sufficiently Regular 39011.1.2 Optimal-Fitness Configuration Trajectories - Fitness Function not Known 39211.1.3 Fitness Configuration Trajectories with Nonoptimal Updates 39711.2 Population-Based Search Metaheuristics 39911.2.1 Population with Evolution 40011.2.2 Population with Group Interaction - Swarm Intelligence 40211.2.3 Population with Indirect Interaction via Artifact 40311.3 Exercises 40412 Stochastic Sequential Analysis (SSA) 40712.1 HMM and FSA-Based Methods for Signal Acquisition and Feature Extraction 40812.2 The Stochastic Sequential Analysis (SSA) Protocol 41012.2.1 (Stage 1) Primitive Feature Identification 41512.2.2 (Stage 2) Feature Identification and Feature Selection 41612.2.3 (Stage 3) Classification 41812.2.4 (Stage 4) Clustering 41812.2.5 (All Stages) Database/Data-Warehouse System Specification 41912.2.6 (All Stages) Server-Based Data Analysis System Specification 42012.3 Channel Current Cheminformatics (CCC) Implementation of the Stochastic Sequential Analysis (SSA) Protocol 42012.4 SCW for Detector Sensitivity Boosting 42312.4.1 NTD with Multiple Channels (or High Noise) 42412.4.2 Stochastic Carrier Wave 42612.5 SSA for Deep Learning 43012.6 Exercises 43113 Deep Learning Tools - TensorFlow 43313.1 Neural Nets Review 43313.1.1 Summary of Single Neuron Discriminator 43313.1.2 Summary of Neural Net Discriminator and Back-Propagation 43313.2 TensorFlow from Google 43513.2.1 Installation/Setup 43613.2.2 Example: Character Recognition 43713.2.3 Example: Language Translation 44013.2.4 TensorBoard and the TensorFlow Profiler 44113.2.5 Tensor Cores 44413.3 Exercises 44414 Nanopore Detection - A Case Study 44514.1 Standard Apparatus 44714.1.1 Standard Operational and Physiological Buffer Conditions 44814.1.2 alpha-Hemolysin Channel Stability - Introduction of Chaotropes 44814.2 Controlling Nanopore Noise Sources and Choice of Aperture 44914.3 Length Resolution of Individual DNA Hairpins 45114.4 Detection of Single Nucleotide Differences (Large Changes in Structure) 45414.5 Blockade Mechanism for 9bphp 45514.6 Conformational Kinetics on Model Biomolecules 45914.7 Channel Current Cheminformatics 46014.7.1 Power Spectra and Standard EE Signal Analysis 46014.7.2 Channel Current Cheminformatics for Single-Biomolecule/Mixture Identifications 46214.7.3 Channel Current Cheminformatics: Feature Extraction by HMM 46414.7.4 Bandwidth Limitations 46514.8 Channel-Based Detection Mechanisms 46714.8.1 Partitioning and Translocation-Based ND Biosensing Methods 46714.8.2 Transduction Versus Translation 46814.8.3 Single-Molecule Versus Ensemble 46914.8.4 Biosensing with High Sensitivity in Presence of Interference 47014.8.5 Nanopore Transduction Detection Methods 47114.9 The NTD Nanoscope 47414.9.1 Nanopore Transduction Detection (NTD) 47514.9.2 NTD: A Versatile Platform for Biosensing 47914.9.3 NTD Platform 48114.9.4 NTD Operation 48414.9.5 Driven Modulations 48714.9.6 Driven Modulations with Multichannel Augmentation 49014.10 NTD Biosensing Methods 49514.10.1 Model Biosensor Based on Streptavidin and Biotin 49514.10.2 Model System Based on DNA Annealing 50114.10.3 Y-Aptamer with Use of Chaotropes to Improve Signal Resolution 50614.10.4 Pathogen Detection, miRNA Detection, and miRNA Haplotyping 50814.10.5 SNP Detection 51014.10.6 Aptamer-Based Detection 51214.10.7 Antibody-Based Detection 51214.11 Exercises 516Appendix A: Python and Perl System Programming in Linux 519A.1 Getting Linux and Python in a Flash (Drive) 519A.2 Linux and the Command Shell 520A.3 Perl Review: I/O, Primitives, String Handling, Regex 521Appendix B: Physics 529B.1 The Calculus of Variations 529Appendix C: Math 531C.1 Martingales 531C.2 Hoeffding Inequality 537References 541Index 559
Stephen Winters-Hilt, PhD, is Sole Proprietor at Meta Logos Systems, Albuquerque, NM, USA, which specializes in Machine Learning, Signal Analysis, Financial Analytics, and Bioinformatics. He received his doctorate in Theoretical Physics from the University of Wisconsin, as well as a PhD in Computer Science and Bioinformatics from the University of California, Santa Cruz.
1997-2024 DolnySlask.com Agencja Internetowa