Saturday, May 30, 2020

How to Optimize the Hadoop cluster for high performance?

A Hadoop cluster is the central part of the Hadoop framework that includes a
group of systems linked together through LAN. This is useful to store and process big data sets. Moreover, the Hadoop cluster includes several commodity hardware linked together. They communicate with a high-end system that acts as a master.
Installation of the Hadoop cluster within production is like getting into a
battlefield. Besides, a Hadoop admin needs to tune the cluster setup to achieve high performance. Moreover, the Hadoop cluster is configured with default settings. It includes less hardware configuration. Therefore, admins must be familiar with different hardware configurations.
There is no unique performance tuning technique that fits all Hadoop works. The performance tuning tools & tricks may differ based on the size of data that is being moved. It also depends upon the type of Hadoop task that runs within production.
The highest sale point for Apache Hadoop as a big data processing framework is the cost-effectiveness in configuring various data centers. This is useful for
processing large size of structured and unstructured data. However, the major
blockage in achieving high performance from a Hadoop cluster is its core
hardware stack.
Therefore, Hadoop admin has to make the best use of a cluster’s capacity to gain the best performance from the hardware stack.
To learn complete Hadoop admin tutorials visit through OnlineItGuru’s hadoop admin online course
Hadoop cluster performance tuning
Let us discuss in detail some of the best and effective performance tuning
techniques, to set up Hadoop clusters with commodity hardware, to increase
cluster performance with reduced operational cost.
Hadoop cluster memory
The initial step to ensure high performance for a Hadoop task is tuning the best configuration parameter for memory. It needs observing the usage of memory on the server. Hadoop includes different options on memory, CPU, and network that help to optimize the performance of the Hadoop cluster. Moreover, each Hadoop MapReduce task gathers data of different input records read, the number of records pipelined for further execution, reducer records, swap memory, etc.
Hadoop jobs are not bound to CPU. But the prime concern is to optimize memory usage.
Here, there is a thumb rule while tuning the memory is to ensure that the tasks don’t attempt swapping. The memory for the job is adjusted by modifying the mapred.child.java.opts within the mapred-site.xml file.
IO Performance improvement
There are some key factors to follow while optimizing MapReduce performance. It ensures that the Hadoop cluster setup is tuned very well.
Here, the Linux Operating System has a test point for each file including
checksum, last session time, creation time, the user of the file, etc. To gain better IO performance, the test point should be disabled in HDFS. Besides, the HDFS supports the write-once-read-many-times model. Therefore, the applications will be able to access the information on the HDFS system randomly.
Here, the mount points for the Data Node configures with no time option. It
makes sure that the metadata is not updated by the NameNode. This is held each time when the data is accessible. The mount for MapReduce storage and HDFS is mounted with no time option. It automatically deactivates access time tracking. Moreover, it offers increased IO performance.
It is important to know that the user doesn’t use LVM and RAID on Data Node
systems. It minimizes performance speed.
Minimize Disk Spill
Disk IO is the major bottleneck in performance speedup. Two different ways
minimize disk spilling are:
● MapReduce job uses a mapper with 70% of heap memory for spilling
buffer.
● Compressing Mapper output
The user mustn't spill more than once; otherwise, it needs to read & write once again.
Compressing LZO
For instance, the Map Output is larger. Here, the mid-size data is reduced with
different techniques such as LZO, BZIP, Snappy, etc. Besides, the Map Output is not compressive usually. To enable Map Output compress such as
MapReduce.map.output.compress, is set to be true. The code is useful to set on the compression technique used such as LZO, Snappy, etc.
Each Map Reduce task releases large Map Output. This helps to benefit from
intermediate data compressed with LZO. Using the LZO compression each 1GB of output data saves a maximum of 3GB of disk space. For example, there is a large.amount of data on the disk to execute Map tasks. Then it enhances the memory size of the buffer to help it well.
Tuning the quantity Mapper/Reducer Tasks
Each map or reduce job generally takes 40 seconds to complete any execution.
When there is a big task to perform it doesn’t use all the slots available within the Hadoop cluster. Therefore, it is much important to tune the number or quantity of map or reduce jobs using the below techniques;
● In case, if the MapReduce task contains more than 1 TB input, then it needs
to make the number of jobs smaller. Here, the block size of the input
dataset needs to enhance up to 512M. The block size of existing files also
changed by setting up the dfs.block.size file. After the usage of commands
to modify the block size, the actual data can be deleted.
● In case, the MapReduce task upon the Hadoop cluster drives different map
tasks where each task completes in a few seconds. Then by reducing the
number of maps launched without impacting the setup of the cluster will
help optimizing its performance.
Using Skewed Joins
Using standard joins in the transform logic with Pig or Hive tools can result in the fatal performance of the MapReduce tasks. Because the data processed may include some skewness. This means 80% of the data is going towards a single reducer. In case, if there is a large amount of data for a single key. Then one of the reducers may hold up with major data processes. This is held when Skewed join comes to prevent. Besides, the Skewed join computes a histogram to identify the dominant key. Later, the data splits based on its different reducers to gain optimal performance.
Writing a Combiner
Based on the Hadoop cluster ecosystem, combiner writing helps to reduce the
amount of data to transfer. This is useful other than the data compression technique. Moreover, it also proves to be beneficial in enhancing cluster
performance.
Speculative Execution
The performance of MapReduce tasks severely impacts when the job takes longer time to finish its execution. Speculative execution is a general approach to resolve this issue by backing up slow going tasks on different systems.
Moreover, by setting up the various configuration parameters such as
‘MapReduce.map.tasks.speculative.execution’and‘MapReduce.reduce.tasks.speculative.execution to true enables speculative execution. This helps to reduce the task execution time in case the task progress becomes slow due to a lack of memory.
There are various performance-optimizing tips and tricks for a Hadoop Cluster and we have discussed some of the important points above. Moreover, the Hadoop community also updates certain tips that help to get optimal performance benefits. Hadoop scales horizontally therefore, admins keep adding many instances to the cluster that results in maximum performance. Therefore, many users want to have their Hadoop cluster so there is no need to share. The above- mentioned tips may help in achieving the best performance.
Get more results practically through the expert’s voice by getting into hadoop
admin online training
at Online IT Guru. This learning may help to enhance
Hadoop skills to get into a better career.

Thursday, May 28, 2020

Hadoop MapReduce partition in Hadoop administration

MapReduce is the heart of Hadoop programming environment. Hadoop MapReduce ensures massive scalability in the Hadoop cluster, which consists of many servers. People who are familiar with the clustered scale-out data processing framework will consider MapReduce concepts easier to understand in Hadoop. It might not be as easy for new entrants to the platform but Hadoop MapReduce features will help them understand what it is and how it works. Before learning about MapReduce partition let us
discuss about Hadoop MapReduce and its features briefly.
To learn complete hadoop admin tutorial visit our Onlineitguru’s hadoop administration online training.
Hadoop MapReduceMapReduce is simply a variation of the Hadoop programming environment performing two distinct tasks. Map job takes specific data sets and transforms them to various data sets. Individual elements in the sets are broken into key or value pairs or tuples. Reduce job is to obtain the output generated by
Map as input and then combine it into smaller sets of tuples. Reducing work would always follow the job on the Map.
Features of Hadoop MapReduceHadoop MapReduce architecture function can be understood by simply taking the example of files and columns. For example; the user can have multiple files with two columns; one representing the key s and another representing the value in the Hadoop cluster. Real life tasks may not be as easy but this is
how the MapReduce Hadoop works.
Tasks of Hadoop MapreduceMapReduce tasks in real time can contain millions of rows and columns, and may not have been well formatted either. But MapReduce's fundamentals of working would always remain the same. City as the
key and rainfall quantum could be an example, as the value that provides the key / value pair needed for MapReduce to function.
Hadoop MapReduce DataData collected and stored, using the MapReduce function, will help find out the maximum and minimum rainfall per area. If there are ten files then they'd be split into ten Map tasks. — mapper will operate on
one of the files, evaluate the data, and return in each of the cases the maximum rainfall or minimum rainfall required.
Performance of Hadoop MapReducePerformance streamed out of all the ten files will be fed into the Reduce process afterwards. The Reduce function would combine all the results of the inputs and the output generating a single value for each city. Thus a common final result sheet showing the rainfall in each of the cities would be
generated.The process is straight forward. It's like tasks that were performed when computers and information technology hadn't evolved that far and everything was done by hand. People were sent to various places
to collect the data, and they would return to their head office and submit the data collected. In MapReduce this is exactly how the Map works. The Hadoop MapReduce function comes with both combiner and partitioner in Hadoop. In this article let us discuss mapreduce partition.
Hadoop Mapreduce Partitioner
A partitioner in the processing of an input data set acts like a state. The process of partitioning occurs after the Map phase and before the Reduce phase.
The number of partitioners is equal to that of reducer numbers. That means that a partitioner will divide the data by the number of reductors. Therefore, one Reducer processes the data passed from a single partitioner.
PartitionerA partitioner partitions intermediate Map-output key-value pairs. It uses a user-defined condition to partition the data works like a hash function. Then the total number of partitions is the same as the number of tasks for the job which are reduced. Let's take an example to understand the workings of the partitioner.
Implementing Hadoop MapReduce PartitionerLet's assume we have a small table called Employee with the following data, for convenience. You can use this sample data to show how the partitioner functions as our input data collection.
Tasks to Hadoop MapReduce partitionerThe map task accepts the key-value pairs as input, while in a text file we have the text info. The data for this map function is as follows.
Input
The pattern such as some special key + filename + line number" (e.g. key = @input1), and the value will
be the data in that line (e.g.: value = 1201\t gopal \t 45\t Male \t 50000).
System
This map function is conducted as follows .
Read the value (record data), which comes from the list of arguments in a string as the input value.
Separate the gender by using the split function and store it in a string variable
String] [str = .toString().split("\t), "-3);
Gender = string[3];
Send the gender information and the record data value from the map task to the partition task as output key-value pair.
Context.write(new word(general), new text(value))
For all records in the text file repeat all the above steps.
Output
As key-value pairs, you'll get the gender data and the record data value.
Task Partitioner in Hadoop MapReduce
The partitioner task accepts the key-value pairs as their input from the map task. Partition means the data is divided into segments. Based on the age criteria, the input key-value paired data can be divided
into three parts, based on the given conditional partition criteria.
Input
All of the data in a key-value pair collection.
Key = Field value for gender in the record.
Value = Whole data value recorded by that gender.
Method
The logic of the partition process runs like this.
Read the value of the input key-value pair for the age field.
String()str = valor.toString().split("\t);
Age of int = Integer.parseInt(str[2]);
With the following conditions test the age value.
● Age under or equal to 20
● Age Less than, or equal to, 30 years.
if(age<=20)
{
return 0;
}
else if(age>20 && age<=30)
{
return 1 % numReduceTasks;
}
else
{
return 2 % numReduceTasks;
}
Output
The entire key-value pair data is segmented into three key-value pair collections. The Reducer functions on each collection individually.
Reducing Tasks Through Hadoop MapReduce Partitioner
The number of partitioner responsibilities is equal to the number of reducer tasks. We have three partitioner tasks here and thus we have to perform three Reducer tasks.
Input
The Reducer executes key-value pairs three times with different collection.
● Key = Field value of the gender in the record.
● Value = That gender 's entire record data.
● Method − For every set the following logic is applied.
● Read the value of each record in the field Salary.
String [] str = val.toString().split("\t", -3);
Note: str[4] have the salary field value.
Check the max variable for salary. If str[4] is the maximum salary, allocate str[4] to max, otherwise skip
the step.
if(Integer.parseInt(str[4])>max)
{
max=Integer.parseInt(str[4]);
}
Repeat steps 1 and 2 for each key collection (the key collections are masculine & feminine). You'll find
one max salary from the Male key collection and one max salary from the Female key collection after executing these three steps.
context.write(new Text(key), new IntWritable(max));
Output
Finally, in three sets of different age groups you will obtain a set of key-value pair results.
It includes the Male collection maximum salary and the Female collection maximum salary in each age group , respectively.The three sets of key-value pair data are stored in three separate files as the output after running the Map, the Partitioner, and the tasks Reduce.
All three tasks are treated as jobs for MapReduce. These job criteria and specifications should be specified in the Configurations .
● Name of Job
● Key and value input and output formats
● Individual classes for tasks related to map, reduction and partitioner
Configuration conf = getConf();
//Create Job
Job job = new Job(conf, "topsal");
job.setJarByClass(PartitionerExample.class);
// File Input and Output paths
FileInputFormat.setInputPaths(job, new Path(arg[0]));
FileOutputFormat.setOutputPath(job,new Path(arg[1]));
//Set Mapper class and Output format for key-value pair.
job.setMapperClass(MapClass.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
//set partitioner statement
job.setPartitionerClass(CaderPartitioner.class);
//Set Reducer class and Input/Output format for key-value pair.
job.setReducerClass(ReduceClass.class);
//Number of Reducer tasks.
job.setNumReduceTasks(3);
//Input and Output format for data
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
Conclusion:I hope you reach a conclusion about Hadoop MapReduce in this article. You can learn about more functions of Hadoop through hadoop admin online course

IOS memory management

Memory management is very important in any application, particularly in IOS applications that have memory and other constraints. It refers to ARC, MRC, reference types, and value types. Memory leaks and App crashes are all too common in apps due to poor IOS memory management.
Automatic Counting for References: ARC in IOS memory managementIn IOS memory management Swift this is conceptually the same as in Objective-C. ARC keeps track of strong references to class instances and increases or decreases their reference count when assigning or unassigning class instances (reference types) to constants, properties and variables as appropriate. It deals with memory used by objects which have been reduced to zero by reference count. ARC does not increase or decrease the value types reference count, since these are copied when assigned. By default all references will be powerful references.
To learn complete ios tutorials visit:ios app development course
Strong reference cyclesStrong Reference Cycles are one of the key concepts ARC needs to be aware of. To fully de-allocate a class instance under ARC it must be free of all strong references to it. But there is a chance that you ould structure your code in such a way that two instances strongly reference each other and thus never allow the reference count of each other to drop nil.
In Swift, there are two ways to solve this.
● Weak references
● Un-owned references.
Both of these approaches allocate an instance without a strong reference to it. Use the weak keyword before a property or attribute declaration for one keyword, and the un-owned keyword for the other.
● Weak referenceWeak reference is used when you know that a reference can become nil when you are certain that the reference has a longer lifecycle and will never become nil. Since weak references can have a value or no value at all, optional variables must be defined.
Un-owned referenceAn un-owned reference must be defined as non-optional, as it is assumed to have a value always.
● Strong Close Reference Cycle:Strong Reference Duration at Closures is another important concept. They could potentially capture themselves when you use closures within a class instance. If self retains the closure in turn, you would have a solid, mutual reference period between closure and class case. This often happens when, for
example, you use lazy, loaded properties. To stop this you should use the same low and unowned keywords. When you describe your closure a so-called catch list should be added to its scope.
● List of catches:A catch list determines how references caught in it would be treated by the closure. By default, if you don't use a capture list, you'll get a strong reference to everything. Capture lists are either defined on the same line where the open bracket is being closed, or on the next line after that. They are described by a pair of square brackets, and each element within them has a weak or unowned keyword prefix and is separated by a comma from other elements. The same reasoning applies to a list of closure captures as to variable references: define a catch variable as a weak optional if it may become nil' and the closure will not be deallocated before that, and define a captured reference as unowned if it never is null until
the closure is dealt.
Originally IOS memory management was non-ARC (Automatic Reference Counting), where we must maintain and release the objects. Now, it supports ARC and we don't have to hold the objects and release them. Xcode automatically takes care of the job during compile time. Issues governing IOS memory management
The two main issues in IOS memory management according to Apple's documentation are as follows.
● Freeing or overwriting of still in use files. It causes memory corruption and typically causes corrupted user data to crash, or worse, in your application.
● Failure to free data which is no longer in use creates memory leaks. It is known as memory leak
when reserved memory is not freed as it will never be used again. Leaks cause your application to use can amounts of memory, which in turn can lead to poor system performance or (in iOS) termination of your application.
Rules of IOS memory management● You own the objects that you create and then we have to release them when they are no longer needed.
● Using Retain to acquire ownership of an entity you failed to create. If these objects are not required, you must release them too.
● Don't release those objects you don't own.
● Conversely, you will not release it if you are not the maker of an entity and have not shown an interest in ownership.
● If you receive an object from another part of your system, it is usually guaranteed that it will remain valid throughout the process or task through which it was obtained. It should be preserved or copied if you want it to remain true outside that reach. When you try to release an already deallocated entity, your program crashes.
Aspects about IOS memory managementFor the proper understanding and management of object memory the following principles are essential
● Pools in auto release.Sending auto release to an object marks the object for subsequent release, which is useful if you want the object released to persist beyond the current scope. Auto releasing an object places it an auto release pool (an example of NSAutoreleasePool), which is created for an arbitrary scope of the program. When the execution of program exits that scope, the objects are released in the pool.
● De-allotmentWhen the retaining count of an object drops to zero, the runtime calls the class of the object's dealloc method just before it destroys the object. A class implements this method to free up any resources that the object contains, including objects that its instance variables refer to.
● Tools in plantMany framework classes define class methods, which create class objects for you as a convenience.Such returned items are not assumed to be accurate beyond the limits of the reception process.
Handling ARC Memory through IOS memory managementFor ARC you don't have to use release and hold. So, when the view controller is removed, all artifacts of the view controller will be released. Likewise, sub-objects of any object will be released upon entry.
Remember that if other classes have a strong reference to a class object then it won't release the whole class. So, it's recommended that delegates use poor properties.
Tools for IOS Memory Management
With the help of Xcode tool instruments, we can examine memory usage. This contains such resources as the Activity Monitor, Allocations, Leaks, Zombies, etc.
ConclusionThis article briefly explains about IOS memory management. You can learn more concepts in IOS through ios online training.

Wednesday, May 27, 2020

Explain Machine Learning with Spark MLlib? In big data and hadoop

Machine learning is a part of Artificial Intelligence that facilitates systems to build
various data models to automate the decision-making process. Spark MLlib
(Machine Learning Library) is an ML component that can scale computation for
ML algorithms. Moreover, Spark MLlib is Sparks’s core module that provides
popular ML algorithms and applications.

The Spark MLlib offers fast, easy, and scalable deployments of different kinds of
machine learning components.

Spark MLlib is developed for simplicity, scalability, and it also easily integrates
with other tools. Besides, using these facilities and speed of Spark, many data
scientists focus on their data and model issues. They don’t involve much in solving
the complex issues of distributed data. Furthermore, Spark MLLib seamlessly
integrates with other Spark components easily.

To learn complete big data and hadoop tutorials visit:big data online course

Spark MLlib vs Spark MLSpark MLlib is useful to perform ML in Apache Spark that consists of various
algorithms and utilities. Besides, there is some difference between Spark MLlib
and Spark ML.

spark.mllib consists of original APIs built on top of RDDs (Resilient Distributed
Datasets) of Spark. But currently it seems under maintenance. Whereas spark.ml
provides higher-level APIs built on top of Data Frames useful for the construction
of ML pipelines. Currently spark.ml is the primary Machine Learning API for
Apache Spark.

The spark.ml is useful because using Data frames the API becomes more versatile
and flexible. But developers keep supporting spark.mllib along with the
development of spark.ml. Most users feel comfortable using spark.mllib features.
Spark ML provides the users with a toolset to create various pipelines of different
machine learning related changes. Moreover, we can see the major differences in
short as follows.

Machine Learning (ML) includes;
● New
● Pipelines
● Data frames
● Easy to construct ML pipelines
Spark MLlib includes;
● Old
● RDD's (Resilient Distributed Datasets)
● Many other features to come
Spark MLlib architecture
Spark MLlib consists of various machine learning libraries. This architecture
provides the following tools:
● Machine Learning Algorithms:
The ML algorithms are the core part of Machine Learning libraries. These
include some common learning algorithms such as classification,
regression, clustering, and filtering.
● ML Pipelines:
The machine learning pipelines include tools for constructing, evaluating,
and tuning of various ML Pipelines.
● Persistence:
It is a way that helps in saving and loading algorithms, models, and different
ML Pipelines within architecture.
● Featurization:
The Featurization includes following such as feature extraction,
transformation, dimensional reduction, and selection.
● Utilities:
These provide utility for linear algebra, statistics, and data handling for
Spark MLlib.
Spark MLlib Algorithms
There are many popular algorithms and utilities within Spark MLlib. These are:
● Statistics
● Classification
● Recommendation System
● Regression
● Clustering
● Optimization
● Feature Extraction
StatisticsMerely Statistics are the algorithms that consist of the most basic of ML
techniques. These are as follows:
Summary Statistics:The summary statistics include Mean, variance, count, max-min, and min-max.
Correlations:These include Pearson’s and Spearman's ways to find the correlation of the given
problem.
Hypothesis Testing:It includes Pearson’s chi-square test as an example.
Random Data Generation:In this Random RDDs, Normal and Poisson methods are useful to generate data
randomly.
Stratified Sampling:This includes sample key and sampleByKeyExact as sampling techniques. These
techniques are useful to test the sample data.
ClassificationIt is the issue of identifying a set of categories of a new observation that belongs
to, based on training datasets. Moreover, it includes instances of known
membership categories. It comes under pattern recognition.
For example, we would be assigning an email into “spam” or “non-spam” classes
which include unnecessary mails, debit card frauds, etc.
Recommendation System

A recommendation system is a part of data filtering that helps to predict the
rating that a user gives to an item. These systems have become very popular in
recent years. Moreover, they are utilized in different areas such as movies, music,
news, books, research articles, queries, social media, and general products.
Moreover, these systems typically produce a list of recommendations in one of
two ways. These include collaborative and content-based filtering approaches.

● The Collaborative filtering approach builds a model from the user's past
behavior (items earlier purchased or selected items). Moreover, it is also
used with similar decisions made by other users. This model is then used to
predict items or ratings given for items that the users have any interest
therein.
Content-Based Filter approach uses a series of discrete characteristics of
any item. This is useful to recommend users additional items having the
same properties.
RegressionThe regression analysis is a statistical process useful to assess the relationships
among different variables. It includes many tools and techniques for modeling
and analyzing the number of variables. Besides, the focus would be on the
relationship between a dependent variable and many independent variables.
Moreover, regression analysis helps in specific that one can understand the
typical value of the dependent variable changes. This is while any one of the
independent variables varies with the other one. Besides, the other free variables
are fixed to some constant value.

Furthermore, this kind of analysis is widely useful in making predictions and
forecasting.

ClusteringThis is a kind of task of grouping some set of objects in such a way that objects in
the same group or clusters are more similar. These may be similar to each other
than to those in other groups or clusters.
Moreover, this is the important task of exploratory data mining, and a common
technique for statistical data analysis, useful in many fields. Besides, this includes
ML, pattern recognition, image analysis, data gathering, computer graphics, and
many more. Some clustering examples include:
● Search results grouping
● Grouping similar customers
● Grouping similar patients, etc.
Feature Extraction
The process of feature extraction starts with a basic set of measurement data. It
builds some derived values intended to be informative. This facilitates the next
step of learning and generalization. Moreover, in some cases it leads to better
human interventions also. This feature is closely related to dimensional reduction.
Dimensionality Reduction
This kind of reduction is the process of minimizing the number of random
variables under consideration. This is carried on through obtaining a set of
principal variables. Moreover, this is divided into two parts such as feature
selection and feature extraction.
Feature Selection: The feature selection helps to find a subset or part of the
original variables or the features or attributes.
Feature Extraction: This helps to transform the data in high-dimensional space to
a less dimensional space.

OptimizationOptimization refers to the selection of the best element from the given set of
available alternatives or variables.

Moreover, generally, optimization includes finding the best value available among
the objective function given a defined input. This includes a variety of different
types of objective functions and different types of domains or inputs.
Thus, it comes to conclude and I hope the above writings give an idea of Machine
learning with Spark MLlib and its different aspects. The Machine Learning
techniques and tools help to make any system process easier. Furthermore,
utilizing Apache Spark MLlib for different large-scale ML strategies ranging from
Big Data classification to clusters is a great theme. It gives strength to the system
with self-learning ability from past activities. Moreover, the Spark MLlib helps in
this regard very much by offering various learning libraries.

This makes the sense of learning Spark and its different libraries. To get in-depth
knowledge of these libraries Big Data Hadoop Online Training from the industry
experts like IT Guru. This learning may help to enhance skills and provide the best
way towards a great career.

Monday, May 25, 2020

Android - XML Parser

XML stands for Extensible Mark-up Language.XML is a very popular format and commonly used for sharing data on the internet. This chapter explains how to parse the XML file and extract necessary information from it.
Android provides three types of XML parsers which are DOM,SAX and XMLPullParser. Among all of them android recommend XMLPullParser because it is efficient and easy to use. So we are going to use XMLPullParser for parsing XML.
The first step is to identify the fields in the XML data in which you are interested in. For example. In the XML given below we interested in getting temperature only.
<?xml version="1.0"?><current><cityid="2643743"name="London"><coordlon="-0.12574"lat="51.50853"/><country>GB</country><sunrise="2013-10-08T06:13:56"set="2013-10-08T17:21:45"/></city><temperaturevalue="289.54"min="289.15"max="290.15"unit="kelvin"/><humidityvalue="77"unit="%"/><pressurevalue="1025"unit="hPa"/></current>

Android XML Parser is a part of android app development course which is offered by OnlineItGuru.

XML - Elements

An xml file consist of many components. Here is the table defining the components of an XML file and their description.
Sr.NoComponent & description1
Prolog
An XML file starts with a prolog. The first line that contains the information about a file is prolog
2
Events
An XML file has many events. Event could be like this. Document starts , Document ends, Tag start , Tag end and Text e.t.c
3
Text
Apart from tags and events, and xml file also contains simple text. Such as GB is a text in the country tag.
4
Attributes
Attributes are the additional properties of a tag such as value e.t.c

XML - Parsing

In the next step, we will create XMLPullParser object , but in order to create that we will first create XmlPullParserFactory object and then call its newPullParser() method to create XMLPullParser. Its syntax is given below −
private XmlPullParserFactory xmlFactoryObject = XmlPullParserFactory.newInstance();
private XmlPullParser myparser = xmlFactoryObject.newPullParser();
The next step involves specifying the file for XmlPullParser that contains XML. It could be a file or could be a Stream. In our case it is a stream.Its syntax is given below −
myparser.setInput(stream, null);
The last step is to parse the XML. An XML file consist of events, Name, Text, AttributesValue e.t.c. So XMLPullParser has a separate function for parsing each of the component of XML file. Its syntax is given below −
intevent= myParser.getEventType();while(event!=XmlPullParser.END_DOCUMENT){String name=myParser.getName();switch(event){caseXmlPullParser.START_TAG:break;caseXmlPullParser.END_TAG:if(name.equals("temperature")){ temperature = myParser.getAttributeValue(null,"value");}break;}event= myParser.next();}
The method getEventType returns the type of event that happens. e.g: Document start , tag start e.t.c. The method getName returns the name of the tag and since we are only interested in temperature , so we just check in conditional statement that if we got a temperature tag , we call the method getAttributeValue to return us the value of temperature tag.
Apart from the these methods, there are other methods provided by this class for better parsing XML files. These methods are listed below −
Sr.NoMethod & description1
getAttributeCount()
This method just Returns the number of attributes of the current start tag
2
getAttributeName(int index)
This method returns the name of the attribute specified by the index value
3
getColumnNumber()
This method returns the Returns the current column number, starting from 0.
4
getDepth()
This method returns Returns the current depth of the element.
5
getLineNumber()
Returns the current line number, starting from 1.
6
getNamespace()
This method returns the name space URI of the current element.
7
getPrefix()
This method returns the prefix of the current element
8
getName()
This method returns the name of the tag
9
getText()
This method returns the text for that particular element
10
isWhitespace()
This method checks whether the current TEXT event contains only whitespace characters.

Example

Here is an example demonstrating the use of XML DOM Parser. It creates a basic application that allows you to parse XML.
To experiment with this example, you can run this on an actual device or in an emulator.
StepsDescription1You will use Android studio to create an Android application under a package com.example.sairamkrishna.myapplication.2Modify src/MainActivity.java file to add necessary code.3Modify the res/layout/activity_main to add respective XML components4Create a new XML file under Assets Folder/file.xml5Modify AndroidManifest.xml to add necessary internet permission6Run the application and choose a running android device and install the application on it and verify the results
Following is the content of the modified main activity file MainActivity.java.
package com.example.sairamkrishna.myapplication;import java.io.InputStream;import javax.xml.parsers.DocumentBuilder;import javax.xml.parsers.DocumentBuilderFactory;import org.w3c.dom.Document;import org.w3c.dom.Element;import org.w3c.dom.Node;import org.w3c.dom.NodeList;import android.app.Activity;import android.os.Bundle;import android.widget.TextView;publicclassMainActivityextendsActivity{TextView tv1;@Overridepublicvoid onCreate(Bundle savedInstanceState){super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); tv1=(TextView)findViewById(R.id.textView1);try{InputStreamis= getAssets().open("file.xml");DocumentBuilderFactory dbFactory =DocumentBuilderFactory.newInstance();DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();Document doc = dBuilder.parse(is);Element element=doc.getDocumentElement(); element.normalize();NodeList nList = doc.getElementsByTagName("employee");for(int i=0; i<nList.getLength(); i++){Node node = nList.item(i);if(node.getNodeType()==Node.ELEMENT_NODE){Element element2 =(Element) node; tv1.setText(tv1.getText()+"\nName : "+ getValue("name", element2)+"\n"); tv1.setText(tv1.getText()+"Surname : "+ getValue("surname", element2)+"\n"); tv1.setText(tv1.getText()+"-----------------------");}}}catch(Exception e){e.printStackTrace();}}privatestaticString getValue(String tag,Element element){NodeList nodeList = element.getElementsByTagName(tag).item(0).getChildNodes();Node node = nodeList.item(0);return node.getNodeValue();}}
Following is the content of Assets/file.xml.
<?xml version="1.0"?><records><employee><name>Sairamkrishna</name><surname>Mammahe</surname><salary>50000</salary></employee><employee><name>Gopal </name><surname>Varma</surname><salary>60000</salary></employee><employee><name>Raja</name><surname>Hr</surname><salary>70000</salary></employee></records>
Following is the modified content of the xml res/layout/activity_main.xml.
<?xml version="1.0" encoding="utf-8"?><RelativeLayoutxmlns:android="http://schemas.android.com/apk/res/android"xmlns:tools="http://schemas.android.com/tools"android:layout_width="match_parent"android:layout_height="match_parent"android:paddingBottom="@dimen/activity_vertical_margin"android:paddingLeft="@dimen/activity_horizontal_margin"android:paddingRight="@dimen/activity_horizontal_margin"android:paddingTop="@dimen/activity_vertical_margin"tools:context=".MainActivity"><TextViewandroid:id="@+id/textView1"android:layout_width="wrap_content"android:layout_height="wrap_content"/></RelativeLayout>
Following is the content of AndroidManifest.xml file.
<?xml version="1.0" encoding="utf-8"?><manifestxmlns:android="http://schemas.android.com/apk/res/android"package="com.example.sairamkrishna.myapplication"><applicationandroid:allowBackup="true"android:icon="@mipmap/ic_launcher"android:label="@string/app_name"android:theme="@style/AppTheme"><activityandroid:name=".MainActivity"android:label="@string/app_name"><intent-filter><actionandroid:name="android.intent.action.MAIN"/><categoryandroid:name="android.intent.category.LAUNCHER"/></intent-filter></activity></application></manifest>
Let's try to run our application we just modified. I assume you had created your AVD while doing environment setup. To run the app from Android studio, open one of your project's activity files and click Run
icon from the toolbar. Android studio installs the app on your AVD and starts it and if everything is fine with your setup and application, it will display following Emulator window −
To learn Android course visit:android training online