Skip to main content

Spark SQL basics

Spark SQL

Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure of both the data and the computation being performed. Internally, Spark SQL uses this extra information to perform extra optimizations. There are several ways to interact with Spark SQL including SQL and the Dataset API. When computing a result the same execution engine is used, independent of which API/language you are using to express the computation. This unification means that developers can easily switch back and forth between different APIs based on which provides the most natural way to express a given transformation.

Creating free databricks cluster

You can create free spark cluster on databricks community cloud. You can sign up there and start creating your cluster. Your cluster will automatically gets dropped after 120 minutes of inactivity. You can create a new cluster every-time you work. The cluster which you create on the community cloud will have only one node and can be useful for learning purpose only. (Create Cluster)

 Mapping Azure storage and query data

There are two ways of accessing data from azure storage.
  • Mount storage in databricks FS.
  • Use spark conf context to access storage.
Here we will try to access a csv data using spark conf context and data frame.

Configuring azure storage using PySpark

You can use the below code to configure azure storage.  (Steps to create azure storage account)


account = "vehicleparking" container = "salesdata" spark.conf.set( "fs.azure.account.key.vehicleparking.blob.core.windows.net", "") path = "wasbs://"+ container + "@"+ account +".blob.core.windows.net/Sales.csv"

Code to read csv from azure blob storage

ds = spark.read.csv(path,header="true",inferSchema="true"); ds.createOrReplaceTempView("SalesData") mydata = spark.sql("SELECT * FROM SalesData WHERE City ='Hyderabad'") mydata.show()

You overall code looks like the below 


Comments

Popular posts from this blog

DataZen Syllabus

INTRODUCTION TO DATAZEN PRODUCT ELEMENTS ARCHITECTURE DATAZEN ENTERPRISE SERVER INTRODUCTION SERVER ARCHITECTURE INSTALLATION SECURITY CONTROL PANEL WEB VIEWER SERVER ADMINISTRATION CREATING AND PUBLISHING DASHBOARDS CONNECTING TO DATASOURCES DESIGNER CONFIGURING NAVIGATOR CONFIGURING VISUALIZATION  PUBLISHING DASHBOARD WORKING WITH MAP  WORKING WITH DRILL THROUGH DASHBOARDS

PowerBI Interview Questions and Answers

Power BI Interview Questions – General Questions 1). What is self-service business intelligence? Ans: Self-Service Business Intelligence (SSBI) is an approach to data analytics that enables business users to filter, segment, and, analyse their data, without the in-depth technical knowledge in statistical analysis, business intelligence (BI). SSBI has made it easier for end users to access their data and create various visuals to get better business insights. Anybody who has basic understanding of the data can create reports to build intuitive and shareable dashboards. 2). What are the parts of Microsoft self-service business intelligence solution? Ans: Microsoft has two parts for Self-Service BI  Excel BI Toolkit – It allows users to create interactive report by importing data from different sources and model data according to report requirement.  Power BI – It is the online solution that enables you to share the interactive reports and queries that you have created using ...

MS BI Syllabus

Microsoft Business Intelligence Course Syllabus SSRS – SQL Server Reporting Services  Getting Started 1. Understanding Reporting (Authoring,Management,Delivery) 2. Installing Reporting (Native Mode, SharePoint Integration mode) 3. Building your first report  Authoring Reports 1. Developing Basic Reports (RDL,wizard,designer,datasource,dataset,formatting) 2. Working with expressions (expression to calculate value, Agg functions, exp for objects) 3. Organizing Data (Data Regions, Table, Matrix, Chart, List) 4. Advance Report (Parameter, drill down, drill through, links, 5. Report Model (Data Source, Data Source View, Model , Report Builder 3.0)  Managing Report ( Report Manager) 1. Managing Content (deploying report, folders, linked reports, datasources, value etc) 2. Managing Security (Item Level , Site navigation, localhost – sql) 3. Managing Server Config (Config Manager, Report Manager, Report Server DB)  Delivering Report 1. Accessing Report (Viewing...