Back to Search
ISBN 9798289704603 is currently unpriced. Please contact us for pricing.
Available options are listed below:

Learn Apache Spark: Build Scalable Pipelines with PySpark and Optimization

AUTHOR Smart Tech Content, Studiod21; Rodrigues, Diego
PUBLISHER Independently Published (06/26/2025)
PRODUCT TYPE Paperback (Paperback)

Description

LEARN APACHE SPARK Build Scalable Pipelines with PySpark and Optimization

This book is designed for students, developers, data engineers, data scientists, and technology professionals who want to master Apache Spark in practice, in corporate environments, public cloud, and modern integrations.

You will learn to build scalable pipelines for large-scale data processing, orchestrating distributed workloads with AWS EMR, Databricks, Azure Synapse, and Google Cloud Dataproc. The content covers integration with Hadoop, Hive, Kafka, SQL, Delta Lake, MongoDB, and Python, as well as advanced techniques in tuning, job optimization, real-time analysis, machine learning with MLlib, and workflow automation.

Includes:

- Implementation of ETL and ELT pipelines with Spark SQL and DataFrames

- Data streaming processing and integration with Kafka and AWS Kinesis

- Optimization of distributed jobs, performance tuning, and use of Spark UI

- Integration of Spark with S3, Data Lake, NoSQL, and relational databases

- Deployment on managed clusters in AWS, Azure, and Google Cloud

- Applied Machine Learning with MLlib, Delta Lake, and Databricks

- Automation of routines, monitoring, and scalability for Big Data

By the end, you will master Apache Spark as a professional solution for data analysis, process automation, and machine learning in complex, high-performance environments.

Content reviewed by A.I. with technical supervision.

apache spark, big data, pipelines, distributed processing, aws emr, databricks, streaming, etl, machine learning, cloud integration Google Data Engineer, AWS Data Analytics, Azure Data Engineer, Big Data Engineer, MLOps, DataOps Professional

Show More
Product Format
Product Details
ISBN-13: 9798289704603
Binding: Paperback or Softback (Trade Paperback (Us))
Content Language: English
More Product Details
Page Count: 258
Carton Quantity: 30
Product Dimensions: 6.00 x 0.54 x 9.00 inches
Weight: 0.77 pound(s)
Country of Origin: US
Subject Information
BISAC Categories
Computers | Languages - SQL
Descriptions, Reviews, Etc.
publisher marketing

LEARN APACHE SPARK Build Scalable Pipelines with PySpark and Optimization

This book is designed for students, developers, data engineers, data scientists, and technology professionals who want to master Apache Spark in practice, in corporate environments, public cloud, and modern integrations.

You will learn to build scalable pipelines for large-scale data processing, orchestrating distributed workloads with AWS EMR, Databricks, Azure Synapse, and Google Cloud Dataproc. The content covers integration with Hadoop, Hive, Kafka, SQL, Delta Lake, MongoDB, and Python, as well as advanced techniques in tuning, job optimization, real-time analysis, machine learning with MLlib, and workflow automation.

Includes:

- Implementation of ETL and ELT pipelines with Spark SQL and DataFrames

- Data streaming processing and integration with Kafka and AWS Kinesis

- Optimization of distributed jobs, performance tuning, and use of Spark UI

- Integration of Spark with S3, Data Lake, NoSQL, and relational databases

- Deployment on managed clusters in AWS, Azure, and Google Cloud

- Applied Machine Learning with MLlib, Delta Lake, and Databricks

- Automation of routines, monitoring, and scalability for Big Data

By the end, you will master Apache Spark as a professional solution for data analysis, process automation, and machine learning in complex, high-performance environments.

Content reviewed by A.I. with technical supervision.

apache spark, big data, pipelines, distributed processing, aws emr, databricks, streaming, etl, machine learning, cloud integration Google Data Engineer, AWS Data Analytics, Azure Data Engineer, Big Data Engineer, MLOps, DataOps Professional

Show More
Paperback