top of page

Data Profiling In Sparkflows

Updated: Dec 3

Data Profiling is extremely helpful in understanding the data. Sparkflows provides a number of processors for users to profile their data.


Workflow for Data Profiling


Below is a workflow which profiles the Telco Churn Dataset.



ree

Input Telco Churn Data


The input dataset looks like below:



ree

Workflow Execution Result


When the above workflow is executed, it produces the below results. The good thing about Sparkflows is that the Data Profiling runs in a distributed fashion. So, whatever the number of records in the input dataset, it scales seamlessly.


Summary Statistics


ree

Counts by Churned Column



ree

Graph of counts of various attributes for Churned and Not Churned customers



ree

Correlation Matrix



ree

Summary


In this blog we saw Sparkflows makes it extremely easy for you to profile your datasets.

Start with downloading and installing Sparkflows onto your laptop. All it needs is Java 8 to be installed on your laptop.

 
 
 

2 Comments


turab hussnain
turab hussnain
Feb 28, 2021

We are manufacturers and exporters of high quality Sportswear & Textile products. Our company produces custom designs and sizes as per our customer’s requirements. For shearling jacket visit!

Like

chris Patel
chris Patel
Feb 03, 2021

A GPS tracking unit is a device that uses the Global Positioning System to determine the precise location of a vehicle,

person, or other asset to which it is attached and to record the position of the asset at regular intervals

https://trackerzone.pk

Like
bottom of page