How to remove tags from HTML in Perl

Satya Perl Solution ,

HMost often, in development project, you get requirement to remove tags from HTML page to extract text. Different programming languages has its own methods to remove HTML tags. Perl is very efficient in removing tag.

In this post, I will explain – How to remove tags from HTML in Perl. This I will do using regular expression.

If you are new to Perl programming then refer Perl basic tutorial.


Remove HTML tags using Perl regular expression

Below code read HTML file and remove tags to extract text. ‘undef $/’ actually undefined line separator so that <> read whole file and assign the content to scalar variable $content. ‘s/<.*?>//gs’ match the opening and closing tag and then substitute it with nothing. Finally, print statement prints only text.

!/usr/bin/perl
undef $/;
while (my $content=<>) {
$content=~s/<.*?>//gs;
print $content;
}

Below is sample.html file. This I will run with above script.

Sample html file

Script Output:

How to remove tags from HTML in Perl

You can see. remove-html-tags.pl removes all the html tags from content of sample.html and print output onto console.

You May Also Like..

How concatenate string in Perl

How to concatenate string in Perl

How to concatenate string in Perl: Concatenating a string is very important feature in any computer programming language. Perl provide […]

Parse Json using Perl module JSON::Parse

Parse Json using Perl module JSON::Parse

Parse Json using Perl module JSON::Parse:  Json is Java script Object Notation. It is a text in Java object notation, […]

How to write excel sheet using Perl module

How to write excel sheet using Perl module : Microsoft excel is an application, which we can use to store […]

Leave a Reply

Your email address will not be published. Required fields are marked *