Thursday, April 30, 2020

Basic data browsing in TM1

Cube Viewer
The following steps can be followed to browse data in the Cube Viewer:
  • Open TM1 Server Explorer.
  • In the Tree pane, select the cube you want to browse
  • Click on the cube and select ..:
OnlineItGuru provides industry-leading online cognos tm1 training for BI professionals.
  • The Cube Viewer window opens with the cube’s system default view.
  • Press F9 or click on Recalculate to display the cell values.
In-Spreadsheet Browser
To access data using this tool the following steps should be followed:
  1. Open Server Explorer.
  2. In the Tree pane, select the cube you want to browse.
  3. Click on the cube and select Browse in Excel.
Modifying your view of data
To do this, the Cube Viewer allows a user to perform the following:
  • Stack dimensions to see more detail along the columns or rows of a view
  • Drill down on the consolidated elements displayed in the view to see the underlying detail
  • Change the title dimension elements in the view to access a completely different view of the cube data
  • Drill-through the view to detailed data
Stacking a title dimension as a row dimension
To stack a title dimension as a row dimension in Cube Viewer, you can click on the element name in the title dimension and drag that element name to the right or left of a row dimension name.
Stacking a title dimension as a column dimension
To stack a title dimension as a column dimension in Cube Viewer, you can click on the element name in the title dimension and drag that element name to the right or left of a column dimension name.
Drilling down through consolidations
In Cube Viewer, a plus sign (+), next to an element name, identifies the element as a consolidation. To view the underlying detail, click on +, and to hide the underlying detail, click on the minus sign ().
Changing title elements in Cube Viewer
To change title elements displayed in Cube Viewer, click on the element name arrow, select an element, and press F9.
Drilling through to detailed data
To access detailed level data for a selected cell of data in Cube Viewer, a drill TI process and drill association rule must be implemented.
Saving the view
To save a view from Cube Viewer, you can click on File and then select Save. TM1 will prompt you with the TM1 Save View dialog. This dialog allows you to:
  • Set this view as the cube’s default view (only one default view per cube is allowed)
  • Save the view as a private or public view
  • Provide a name for the view
Formatting view cells
The most common features provided by TM1 to format data displayed in a cube view are:
  • Zero suppression
  • Cell formatting
  • Column orientations
Zero suppression
You can turn off rows and/or columns that are zero.
For row suppression, click on Options | Suppress Zeroes On Rows:
For column suppression, click on Options | Suppress Zeroes On Columns:
Cell formatting
To customize the display of data within a cube view cell, the Format attribute can be used. The Format attribute is a special type of element attribute that determines how data is displayed. The value of the format attribute can be applied to column elements, row elements, or title elements
Column orientations
You may change it from right to left by clicking on the menu option Layout Right to Left (or Layout Left to Right, depending upon the current view).
Worksheets
From TM1 Cube Viewer, data can be sliced into Microsoft Excel for analysis and further presentation.
Slicing a view
To slice a view from Cube Viewer to an Excel worksheet, you can click on File and then select Slice. TM1 will copy the current view into a new MS Excel workbook. Sliced views are still connected to TM1, because TM1 adds the appropriate TM1 functions to the worksheet to retrieve the data from the cube.
Snapshots
From TM1 Cube Viewer, data can be copied into Microsoft Excel for analysis and further presentation. In contrast to the slice, the snapshot is a copy of the data at that moment in time. To create a snapshot from Cube Viewer, you can click on File and then select Snapshot.
Active Forms
Active Forms let you view (and update) cube data directly in Excel whenever you are connected to the server on which the cube data resides.
There are special groups of TM1 worksheet functions that are used for the support of Active Forms. These groups include:
  • TM1RptView
  • TM1RptTitle
  • TM1RptRow
  • TM1RptFilter
  • TM1RptElLev
  • TM1RptElIsExpanded
  • TM1RptElIsConsolidated
Refreshing and recalculating
Once you have saved an Active Form, you can refresh the data displayed in it by pressing F9. Additionally, you also have the option to rebuild the current worksheet or current workbook by clicking on Active Form and then selecting Rebuild Current Sheet
Deleting an Active Form
To remove an Active Form from a worksheet, you must click in the data area of an Active Form and then click on Active Form | Delete.
Active Forms and special formatting
  • Supressing zero values: You can selectively suppress or display rows containing only zero values in an Active Form.
  • Spreading and holding data: All data spreading and holding operations are fully supported within Active Forms.
  • Drill: The ability to drill-to related data displayed in an Active Form is supported (drill processes and rules must first be set up).
  • Editing of the row subsets used: The row subset for an Active Form is defined by the TM1RptRow function and can be changed.
  • Static lists: The row subset within an Active Form can be saved as a static list by clicking on Active Form and then selecting Save Row Elements as Static List.
  • Changing the title element: You can access a completely different view of cube data in the Active Form by double-clicking a title element in the Active Form, then selecting a new element and clicking on OK.
  • Inserting a dependent section: An Active Form can be split into multiple sections to access additional data. An additional Active Section would use the same column and title dimensions as the parent Active Form with which it is associated but has unique row elements.
  • Insertion of columns: You can insert a column within an Active Form directly within the Active Form, to the left of the Active Form, or to the right of the Active Form.
Formatting Active Forms
Active Form formatting is defined via a format range which is hidden within the Active Form worksheet. This hidden range is revealed by clicking on Active Form and then selecting Show Format Area.
Active Form default formatting
Active Forms must have the default formatting defined in rows 1 to 8 where row 1 will always have the Begin Format Range label and row 8 will always have the End Format Range label:
Learn about cognos tm1 online training by watching the onlineitguru's Cognos video now.
Modifying default formatting
Even though the formatting range for an Active Form is initially defined in rows 2 through 7, you can create multiple additional format definitions and insert them between the Begin Format Range and End Format Range labels of the Active Form:
  1. Click on the End Format Range label.
  2. From the Excel Insert menu, click on Row.
  3. Use the Excel Format Cells dialog box to apply formatting to the cells in the new formatting row.
  4. In column A, assign a unique format definition label to the formatting row.
Active Form limitations
There are some limitations with using Active Forms:
  • Worksheet names cannot include a dash (-) character.
  • Merging cells in an Active Form always requires a rebuild of the worksheet or workbook.
  • Active Forms require at least one-row dimension.
TM1 reports
TM1 reports must be created from a TM1 slice (described earlier in this chapter). From there, TM1 reports utilize a Report Generation Wizard to:
  • Select the worksheets to be included in the report
  • Select the title dimensions and subsets for the report
  • Select workbook print options
  • Select a print destination for the report (printer, Excel file, or PDF file)
  • Save the report settings
TM1 Print Report Wizard
To use the reporting wizard, from the menu bar, click on TM1, then select the Print Report option. From the Print Report Wizard dialog, you can select the desired options and click on Next to step through and set up all of the report options and then click on Finish.
Selecting the sheets
You can use TM1 Print Report Wizard to select any (or all) of the worksheets from the current Excel workbook to include in your report. In the wizard, you can select the checkbox of the worksheet that you want to include in the report or you can click on Select All to include all of the worksheets in the current workbook in the report.
Selecting the title dimensions
You can select the title dimensions to be included in the report by selecting and moving them from the Available Title Dimensions list to the Selected Title Dimensions list.

Selecting workbook print options
  • Print a single workbook or print multiple workbooks: You can use TM1 Print Report Wizard to print a single workbook or even multiple workbooks.
  • Selecting a print destination: The final step in using TM1 Print Report Wizard to create a report is to select a print destination of a printer, Excel file, or PDF file. Printing options available include the printer name, number of copies, print to file, and collate.
  • Saving the TM1 report as an Excel or PDF document: You can select the Save as Excel Files or Save as PDF Files option on third screen of the TM1 Print Report Wizard to save the report in the desired format.
For these options you can provide the following:
  1. Generate a new workbook for each title.
  2. Generate a file name for Excel or PDF.
  3. Generate a directory name or location in which to save the output file(s).
  4. For Excel output files, specify if the generated report will include TM1 formulas to access TM1 data in the future.
TM1 Web
TM1 Web allows access to the cube data, provides the ability to view and edit via Excel reports, drill, pivot, select, and filter the TM1 data, cube data sourced charts, and even some TM1 Server Administrator tasks.
TM1 Websheets
It should be understood that the websheet version of the Excel sheet will have various visual differences but it does support the following Excel features:
  • Hiding columns
  • Conditional formatting
  • Hyperlinking
  • Freeze panes
  • Cell protection
In this websheet a user can:
  • Enter data in cells to which they have write access
  • Use the data spreading feature of TM1
  • Drill to relational tables or other cubes
  • View Excel charts
  • Manipulate title element subsets in the Subset Editor
TM1 Web Cube Viewer
To utilize Cube Viewer within TM1 Web you can:
  • Log in to TM1 Web and open Views in the Navigation pane
  • Expand the cube views that you are interested in and click on the view that you wish to access
Creating a new TM1 Web cube view
To create a new TM1 Web cube view, you can use the TM1 Web View Builder Wizard.
TM1 Charts
To view cube data in TM1 Web in chart format, you can click on View Chart, View Chart and Grid, or View Grid from the toolbar.
From the TM1 Web chart display, you can click on Chart Properties to change the chart type, colors, legend, and 3D view elements.
Custom display formats
We reviewed element attributes and the Format attribute. Again, to revisit, this attribute allows the definition of a number of formats for numbers, dates, times, and strings If you right-click on a dimension, click on Edit Element Attributes, and then click on the cell at the intersection of the Format column and the element for which you want to define a display format, TM1 will allow you to select a standard format for that element or define your own. These formats will be applied to all cells in the defined data intersection.

Android - Drag and Drop

Android drag/drop framework allows your users to move data from one View to another View in the current layout using a graphical drag and drop gesture. As of API 11 drag and drop of view onto other views or view groups is supported.The framework includes following three important components to support drag & drop functionality −
  • Drag event class.
  • Drag listeners.
  • Helper methods and classes.

The Drag/Drop Process

There are basically four steps or states in the drag and drop process −
  • Started − This event occurs when you start dragging an item in a layout, your application calls startDrag() method to tell the system to start a drag. The arguments inside startDrag() method provide the data to be dragged, metadata for this data, and a callback for drawing the drag shadow.
  • The system first responds by calling back to your application to get a drag shadow. It then displays the drag shadow on the device.
  • Next, the system sends a drag event with action type ACTION_DRAG_STARTED to the registered drag event listeners for all the View objects in the current layout.
  • To continue to receive drag events, including a possible drop event, a drag event listener must return true, If the drag event listener returns false, then it will not receive drag events for the current operation until the system sends a drag event with action type ACTION_DRAG_ENDED.
  • Continuing − The user continues the drag. System sends ACTION_DRAG_ENTERED action followed by ACTION_DRAG_LOCATION action to the registered drag event listener for the View where dragging point enters. The listener may choose to alter its View object's appearance in response to the event or can react by highlighting its View.
  • The drag event listener receives a ACTION_DRAG_EXITED action after the user has moved the drag shadow outside the bounding box of the View.
  • Dropped − The user releases the dragged item within the bounding box of a View. The system sends the View object's listener a drag event with action type ACTION_DROP.
  • Ended − Just after the action type ACTION_DROP, the system sends out a drag event with action type ACTION_DRAG_ENDED to indicate that the drag operation is over.
To learn android course visit:android online training

The DragEvent Class

The DragEvent represents an event that is sent out by the system at various times during a drag and drop operation. This class provides few Constants and important methods which we use during Drag/Drop process.

Constants

Following are all constants integers available as a part of DragEvent class.
ACTION_DRAG_STARTED
Signals the start of a drag and drop operation.
ACTION_DRAG_ENTERED
Signals to a View that the drag point has entered the bounding box of the View.
ACTION_DRAG_LOCATION
Sent to a View after ACTION_DRAG_ENTERED if the drag shadow is still within the View object's bounding box.
ACTION_DRAG_EXITED
Signals that the user has moved the drag shadow outside the bounding box of the View.
ACTION_DROP
Signals to a View that the user has released the drag shadow, and the drag point is within the bounding box of the View.
ACTION_DRAG_ENDED
Signals to a View that the drag and drop operation has concluded.

Methods

Following are few important and most frequently used methods available as a part of DragEvent class.
int getAction()
Inspect the action value of this event..
ClipData getClipData()
Returns the ClipData object sent to the system as part of the call to startDrag().
ClipDescription getClipDescription()
Returns the ClipDescription object contained in the ClipData.
boolean getResult()
Returns an indication of the result of the drag and drop operation.
float getX()
Gets the X coordinate of the drag point.
float getY()
Gets the Y coordinate of the drag point.
String toString()
Returns a string representation of this DragEvent object.

Listening for Drag Event

If you want any of your views within a Layout should respond Drag event then your view either implements View.OnDragListener or setup onDragEvent(DragEvent) callback method. When the system calls the method or listener, it passes to them a DragEvent object explained above. You can have both a listener and a callback method for View object. If this occurs, the system first calls the listener and then defined callback as long as listener returns true.
The combination of the onDragEvent(DragEvent) method and View.OnDragListener is analogous to the combination of the onTouchEvent() and View.OnTouchListener used with touch events in old versions of Android.

Starting a Drag Event

You start with creating a ClipData and ClipData.Item for the data being moved. As part of the ClipData object, supply metadata that is stored in a ClipDescription object within the ClipData. For a drag and drop operation that does not represent data movement, you may want to use null instead of an actual object.
Next either you can extend extend View.DragShadowBuilder to create a drag shadow for dragging the view or simply you can use View.DragShadowBuilder(View) to create a default drag shadow that's the same size as the View argument passed to it, with the touch point centered in the drag shadow.

Example

Following example shows the functionality of a simple Drag & Drop using View.setOnLongClickListener(), View.setOnTouchListener()and View.OnDragEventListener().
StepDescription1You will use Android studio IDE to create an Android application and name it as My Application under a package com.example.saira_000.myapplication.2Modify src/MainActivity.java file and add the code to define event listeners as well as a call back methods for the logo image used in the example.3Copy image abc.png in res/drawable-* folders. You can use images with different resolution in case you want to provide them for different devices.4Modify layout XML file res/layout/activity_main.xml to define default view of the logo images.5Run the application to launch Android emulator and verify the result of the changes done in the application.Following is the content of the modified main activity file src/MainActivity.java. This file can include each of the fundamental lifecycle methods.
package com.example.saira_000.myapplication;

import android.app.Activity;

import android.content.ClipData;
import android.content.ClipDescription;

import android.support.v7.app.ActionBarActivity;
import android.os.Bundle;
import android.util.Log;

import android.view.DragEvent;
import android.view.Menu;
import android.view.MenuItem;
import android.view.MotionEvent;
import android.view.View;

import android.widget.ImageView;
import android.widget.RelativeLayout;


public class MainActivity extends Activity {
   ImageView img;
   String msg;
   private android.widget.RelativeLayout.LayoutParams layoutParams;
   
   @Override
   protected void onCreate(Bundle savedInstanceState) {
      super.onCreate(savedInstanceState);
      setContentView(R.layout.activity_main);
      img=(ImageView)findViewById(R.id.imageView);
      
      img.setOnLongClickListener(new View.OnLongClickListener() {
         @Override
         public boolean onLongClick(View v) {
            ClipData.Item item = new ClipData.Item((CharSequence)v.getTag());
            String[] mimeTypes = {ClipDescription.MIMETYPE_TEXT_PLAIN};
            
            ClipData dragData = new ClipData(v.getTag().toString(),mimeTypes, item);
            View.DragShadowBuilder myShadow = new View.DragShadowBuilder(img);
            
            v.startDrag(dragData,myShadow,null,0);
            return true;
         }
      });
      
      img.setOnDragListener(new View.OnDragListener() {
         @Override
         public boolean onDrag(View v, DragEvent event) {
            switch(event.getAction()) {
               case DragEvent.ACTION_DRAG_STARTED:
               layoutParams = (RelativeLayout.LayoutParams)v.getLayoutParams();
               Log.d(msg, "Action is DragEvent.ACTION_DRAG_STARTED");
               
               // Do nothing
               break;
               
               case DragEvent.ACTION_DRAG_ENTERED:
               Log.d(msg, "Action is DragEvent.ACTION_DRAG_ENTERED");
               int x_cord = (int) event.getX();
               int y_cord = (int) event.getY();
               break;
               
               case DragEvent.ACTION_DRAG_EXITED :
               Log.d(msg, "Action is DragEvent.ACTION_DRAG_EXITED");
               x_cord = (int) event.getX();
               y_cord = (int) event.getY();
               layoutParams.leftMargin = x_cord;
               layoutParams.topMargin = y_cord;
               v.setLayoutParams(layoutParams);
               break;
               
               case DragEvent.ACTION_DRAG_LOCATION  :
               Log.d(msg, "Action is DragEvent.ACTION_DRAG_LOCATION");
               x_cord = (int) event.getX();
               y_cord = (int) event.getY();
               break;
               
               case DragEvent.ACTION_DRAG_ENDED   :
               Log.d(msg, "Action is DragEvent.ACTION_DRAG_ENDED");
               
               // Do nothing
               break;
               
               case DragEvent.ACTION_DROP:
               Log.d(msg, "ACTION_DROP event");
               
               // Do nothing
               break;
               default: break;
            }
            return true;
         }
      });
      
      img.setOnTouchListener(new View.OnTouchListener() {
         @Override
         public boolean onTouch(View v, MotionEvent event) {
            if (event.getAction() == MotionEvent.ACTION_DOWN) {
               ClipData data = ClipData.newPlainText("", "");
               View.DragShadowBuilder shadowBuilder = new View.DragShadowBuilder(img);
               
               img.startDrag(data, shadowBuilder, img, 0);
               img.setVisibility(View.INVISIBLE);
               return true;
            } else {
               return false;
            }
         }
      });
   }
}
Following will be the content of res/layout/activity_main.xml file −
In the following code abc indicates the logo of tutorialspoint.com
<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
   xmlns:tools="http://schemas.android.com/tools" 
   android:layout_width="match_parent"
   android:layout_height="match_parent" 
   android:paddingLeft="@dimen/activity_horizontal_margin"
   android:paddingRight="@dimen/activity_horizontal_margin"
   android:paddingTop="@dimen/activity_vertical_margin"
   android:paddingBottom="@dimen/activity_vertical_margin" 
   tools:context=".MainActivity">
   
   <TextView
      android:layout_width="wrap_content"
      android:layout_height="wrap_content"
      android:text="Drag and Drop Example"
      android:id="@+id/textView"
      android:layout_alignParentTop="true"
      android:layout_centerHorizontal="true"
      android:textSize="30dp" />
      
   <TextView
      android:layout_width="wrap_content"
      android:layout_height="wrap_content"
      android:text="Tutorials Point"
      android:id="@+id/textView2"
      android:layout_below="@+id/textView"
      android:layout_centerHorizontal="true"
      android:textSize="30dp"
      android:textColor="#ff14be3c" />>
      
   <ImageView
      android:layout_width="wrap_content"
      android:layout_height="wrap_content"
      android:id="@+id/imageView"
      android:src="@drawable/abc"
      android:layout_below="@+id/textView2"
      android:layout_alignRight="@+id/textView2"
      android:layout_alignEnd="@+id/textView2"
      android:layout_alignLeft="@+id/textView2"
      android:layout_alignStart="@+id/textView2" />

</RelativeLayout>
Following will be the content of res/values/strings.xml to define two new constants −
<?xml version="1.0" encoding="utf-8"?>
<resources>
   <string name="app_name">My Application</string>
</resources>
Following is the default content of AndroidManifest.xml −
<?xml version="1.0" encoding="utf-8"?>
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
   package="com.example.saira_000.myapplication" >
      
   <application
      android:allowBackup="true"
      android:icon="@drawable/ic_launcher"
      android:label="@string/app_name"
      android:theme="@style/AppTheme" >
      
      <activity
         android:name=".MainActivity"
         android:label="@string/app_name" >
      
         <intent-filter>
            <action android:name="android.intent.action.MAIN" />
            <category android:name="android.intent.category.LAUNCHER" />
         </intent-filter>
      
      </activity>
      
   </application>
</manifest>
Let's try to run your My Application application. I assume you had created your AVD while doing environment setup. To run the app from Android Studio, open one of your project's activity files and click Run 
 icon from the toolbar. Android studio installs the app on your AVD and starts it and if everything is fine with your setup and application, it will display following Emulator window −

Android Drag and Drop
Now do long click on the displayed TutorialsPoint logo and you will see that logo image moves a little after 1 seconds long click from its place, its the time when you should start dragging the image. You can drag it around the screen and drop it at a new location.

Android Drop to New Location

To learn more about android course tutorials visit our blog,android development course

Tuesday, April 28, 2020

Spark RDD Optimization Techniques In Hadoop

Welcome to the lesson ‘Spark RDD Optimization Techniques’ of Big Data Hadoop Tutorial which is a part of ‘big data hadoop online training’ offered by OnlineItGuru.
In this lesson, we will look into the lineage of Resilient Distributed Datasets or RDDs and discuss how optimization and performance improvement can be achieved by using the Apache Spark technique of persisting the RDDs.
Let us look at the objectives of this lesson.

Objectives

After completing this lesson, you will be able to:
  • Explain how RDD lineage is created
  • Explain how to mark persistence on an RDD
  • Explain the features of RDD persistence
  • List the storage levels in RDD persistence
  • Describe how distributed persistence and fault tolerance help avoid data loss
In the next section of this Spark tutorial, we will learn the concept of RDD Lineage.

Resilient Distributed Database (RDD) Lineage

RDD lineage is built whenever transformations are applied on an RDD. Let us understand this concept with the help of an example.

Creating Child RDDs

In the example above, the first step is creating an RDD named mydata by reading the text file Simplilearn.txt.
Step 1: Creating an RDD named mydata
To create the RDD mydata, execute the sc.textfile command as shown in the first line of code on the section. Observe that the RDD is created, and it is linked to the base file. However, the RDD is empty at this time.
Step 2: Executing a transformation operation
The second step is to execute a transformation operation to convert the contents of simplilearn.txt to uppercase as shown in the second line of code.
A child MappedRDD2 is then created and linked to the base text file. This is also empty at this time.
Step 3: Executing a transformation operation to filter the sentences
The third step is to execute a transformation operation to filter the sentences from MappedRDD 2 that begin with uppercase “I” as shown in the second line of the code.
The RDD myrdd is now created and linked to the parent MappedRDD2, but it is also empty at this time. Each time a transformation operation is executed on the file, a child RDD is created. Thus, the RDD lineage is built.
The third step is to execute an Action. In the example, the Count action is executed as shown in the third line of code. The command will compute and return the value.
The Count action is executed on the last RDD, which is myrdd. Only when the action is executed, the RDDs are populated with the data.
The previously issued transformation commands are executed on the RDDs, and they are now populated with the data.
Step 4: Executing the Count Action
In the fourth step, the Count action is executed again. But it requires all the three transformations in the entire RDD lineage to be executed again because the data will not be stored.
Let us look at RDD persistence in the next section.

What is Spark RDD Persistence?

RDD persistence is an optimization technique for Apache Spark. It helps avoid re-computation of the whole lineage and saves the data by default in the memory. This improves performance.
Let us now observe how saved data is used for “Action” operations and how re-computation of the entire lineage is avoided when persistence is used on the RDD.

Marking an RDD for Persistence

To mark an RDD for persistence, use the Persist method. Once this is done, data of that RDD is saved by default in the memory. Observe the example shown in the section.
Step 1: Creating the RDD mydata
The first step is creating the RDD mydata by reading the text file simplilearn.txt.
Step 2: Executing the transformation
The second step is to execute the transformation to convert the contents of the text file to upper case as shown in the second line of the code. The child RDD myrdd1 is created.
Step 3: Mark the RDD for persistence
In this step, we mark the RDD for persistence. In the third line of the code, myrdd1 is marked for persistence. So, this data will be stored in memory.
Step 4: Filtering the sentences from myrdd
The fourth step is to filter the sentences from myrdd1 that begin with an upper case “I.” This creates the child RDD myrdd2. However, myrdd2 is not marked for persistence. Hence, the data of myrdd2 will not be stored.
Observe that all the child RDDs created when the transformations are executed are empty and will be populated on the execution of an Action.
Step 5: Optimization with RDD Persistence
In this step, we execute the Count action, and the entire lineage is computed. However, the data of myrdd1 will be saved because it was marked for persistence.
Step 6:Executing the Count Action
In the sixth step, the Count action is executed again. This time, however, the transformations of the entire lineage are not computed. The transformation applied for myrdd2 are computed, because the Count action has a dependency on the content of myrdd2.
Myrdd2 has a dependency on myrdd1, but since myrdd1 was marked for persistence, the Count action uses the saved data and outputs the result.
Observe that the transformation on mydata and the base are not recomputed. Thus, optimization is achieved, and performance is improved by the use of RDD persistence.

Features of RDD Persistence

The various features of RDD Persistence are explained below in detail.
Storage and reuse of the RDD partitions:
RDD persistence facilitates Storage and reuse of the RDD partitions. When an RDD is marked for persistence, every node stores any of the RDD partitions computed in memory. It then reuses them in other actions on the dataset. This facilitates better speed.
Automatic recomputation of lost RDD partitions:
Automatic re-computation of lost RDD partitions: If an RDD partition is lost, it is automatically re-computed using the original transformations. Thus, the cache is fault-tolerant.
Storage of persisted RDDs on different storage levels:
Every persisted RDD is stored on a different storage level that is determined by the Storage Level object passed to the Persist method.
Let us now look at the different storage levels in RDD Persistence.

Storage Levels in RDD Persistence

The various storage levels in RDD Persistence are explained below in detail.
MEMORY_ONLY
The Memory Only level allows storing RDD as deserialized Java objects. However, if any RDD does not fit in memory, a few partitions will not be cached. These will be re-computed on the go when required.
MEMORY_AND_DISK
The Memory and Disk level allows storage of RDD as deserialized Java objects. Also, if any RDD does not fit in memory, it stores it on the disk and reads from the disk when required.
MEMORY_ONLY_SER
The Memory Only Ser level stores RDD as serialized Java objects. This enables better space efficiency, especially in case of a fast serializer.
MEMORY_AND_DISK_SER
The Memory and Disk Ser level are similar to MEMORY_ONLY_SER, except that it spills partitions not fitting in memory to disk.
DISK_ONLY
The Disk Only level allows storage of RDD partitions only on disk.
MEMORY_ONLY_2, MEMORY_AND_DISK_2
The Memory Only 2, Memory And Disk 2, and other levels are similar to the Disk Only level, except that they replicate every partition on two cluster nodes.
OFF_HEAP
The Off-Heap level, experim
ental, allows storage of RDD in serialized format in Tachyon, the default off-heap option in Apache Spark. It reduces garbage collection overhead as compared to the Memory only Ser level and avoids losing the in-memory cache.
Next, we will see how to select the correct storage level.

Selecting the correct storage level

The storage level needs to be decided because of the trade-offs between memory usage and CPU efficiency. Consider the following when you need to decide on which storage level to select.
  • If the RDDs fit with the default storage level, which is the Memory Only level, the default level should be left as is.
  • If it is not possible to leave the default level as is, which may be the case if the storage space is insufficient, use the Memory Only Ser level and also use a fast serialization library.
  • When fast fault recovery is needed, use the replicated storage levels, that is the MEMORY_ONLY_2, MEMORY_AND_DISK_2 levels.
  • If environments have high amounts of memory or multiple applications, use the experimental Off-Heap storage level.
In the following section of this Spark Tutorial, we will try to understand what Distributed Persistence is.

Distributed Persistence

By default, Apache Spark RDDs are distributed across the cluster and persisted in memory in executor virtual machines.
As shown on the section, two RDDs are in different nodes and different executor JVMs. This enables fault-tolerance.
Now, suppose node 2 is down, as shown on the section. When the node is down, the RDD partition in that node is lost. Then the driver starts a new task to re-compute the partition on a different node.
Observe that the task is now executed on Node 3. This preserves the RDD Lineage, and the data is never lost. Persistence option on an RDD can be changed. To unmark persistence on the RDD, and to remove the RDD from memory and disk, use the rdd.unpersist method.

Fault Tolerance in Spark RDD

Apache Spark RDDs are fault tolerant as they track data lineage information.
  • They rebuild lost data on failure using lineage.
  • Each RDD remembers how it was created from other datasets (by transformations like a map, join, or group by) and recreates itself.
  • You can mark an RDD to be persisted using the persist() or cache() methods.
  • The first time it is computed in action, it will be kept in memory on the nodes.
  • Apache Spark’s cache is fault-tolerant, which means if any partition of an RDD is lost, it will automatically be recomputed using the transformations that created it.
  • The distributed persistence architecture is targeted at applications that have distributed active requirements.

Changing Persistence Options

To change the Apache Spark RDD persistence to a different storage level, the persistence on the RDD must first be removed. So the unpersist method must first be used, and then the RDD can again be marked for persistence with a different storage level.

Summary

Let’s now summarize what we learned in this lesson.
  • When RDD lineage is preserved, every parent transformation is re-computed for each“Action operation.
  • To overcome the issue of re-computation and to improve performance, RDD persistence can be used.
  • RDD can be persisted by using the Persist method. This saves the data of the persisted RDD by default in the memory, which enables faster access and computation.
  • Every persisted RDD is stored on a different storage level.
  • Distributed persistence enables fault tolerance and avoids loss of data.
  • Persistence options can be changed as needed.
Willing to take up a course in Haoop? Check out our ,big data and hadoop course

Conclusion

This concludes the lesson on Spark RDD Optimization Techniques.