How to strip a string of <html>, ,</html>, <body>, </body>, <form ... >, </form> tags?

I have stream which is the HTML input of a page. Now I want to use only that part of this page, that is within <form .....> and </form> tags, and excluding these tags.
How would I go about stripping <html>, ,</html>, <body>, </body>, <form ... >, </form>, <head> and </head> tags? I have to make sure that <head ...javascript..> and its corresponding </head> tags are not stripped in this process.
sun21170
0
sun21170
8/17/2005 5:58:52 PM
asp.net.getting-started 91979 articles. 4 followers. Follow

11 Replies
2225 Views

Similar Articles

[PageSpeed] 57
Get it on Google Play
Get it on Apple App Store

   Wait... you said you wanted to strip the head tags, and you're also saying you need to NOT strip the head tags.  What do you need?

   What if you enclose everything in a div or span tag, just inside the form tags... and then use Javascript to get myDiv.innerHTML?  Would that get everything you need?

Cheers,


Peter Brunone
MS MVP, ASP.NET
Founder, EasyListBox.com
Do the impossible, and go home early.
0
PeterBrunone
8/17/2005 7:23:29 PM

Here's a function to strip out all html tags...
 

private string StripHTML(string htmlString){

string pattern = @"<(.|\n)*?>";

return Regex.Replace(htmlString,pattern,string.Empty);

}

From this, you need to adjust it or pull out from the entire code what you want to strip and put into the function, then add it back.

Not exactly what you need, but a good start.

Zath

0
Zath
8/17/2005 7:33:59 PM
I'm assuming that doesn't handle HTML encoded characters as well...?

Peter Brunone
MS MVP, ASP.NET
Founder, EasyListBox.com
Do the impossible, and go home early.
0
PeterBrunone
8/17/2005 7:41:20 PM
The point is I did not want the javascript within head tags to be stripped because then the response stream will not render or function correctly in a browser.
sun21170
0
sun21170
8/17/2005 10:15:24 PM

I suppose now would be a good time to ask (a) where this HTML is coming from, (b) where it is going, and (c) where it will be processed, i.e. client or server.

 


Peter Brunone
MS MVP, ASP.NET
Founder, EasyListBox.com
Do the impossible, and go home early.
0
PeterBrunone
8/18/2005 1:02:51 AM
Peter,

The answers to your questions are:
(a) I am getting the html response for a web page on a different web server. For example, if my site is xxx.com, then from within the code behind of www.xxx.com/getresults.aspx , I am getting the final html response of  www.yy.com/findindexes.aspx. I use WebClient class for this. Then I want to display the output in my getresults.aspx page which is an empty page except for the code-behind that requests the remote page and then receives that remote page's html output. Note that these url's are hypothetical ones so don't expect to get anything when you click on them.

(b) As you must have guessed its coming from a remote server i.e. www.yy.com/findindexes.aspx
(c) Its going to become the final ouput of  www.xxx.com/getresults.aspx

The problem is if I add the remote page's html output to my receiving page I get an error since there will be two forms, two html tags, two head tags and two body tags. Actually I would be very happy if somehow I could make the receving page have no tags to start with because then I could simply write the remote response into the receiving page without creating double tags.

sun21170
0
sun21170
8/18/2005 1:16:04 AM
Note that these url's are hypothetical ones so don't expect to get anything when you click on them.

Uh... thanks Smile [:)]

Actually I would be very happy if somehow I could make the receving page have no tags to start with because then I could simply write the remote response into the receiving page without creating double tags.

What's stopping you?  Parse the whole thing, and then have absolutely no HTML in your getresults page.  Since I would assume you're getting the entire response as a string, the way I've always done it, just take that string and Response.Write it to the page.  Since it's all HTML you won't have any connection to Viewstate, but I wouldn't think that would be a problem.

I'm not sure what kind of error you're getting since you wouldn't have any duplicate *server-side* tags, but if you use this method you won't have to worry about that.

Peter Brunone
MS MVP, ASP.NET
Founder, EasyListBox.com
Do the impossible, and go home early.
0
PeterBrunone
8/18/2005 1:30:35 AM
I think, the empty getresults.aspx has the page directive which should not be a problem, but it has html, body and form tags but no controls inside it. This is what is created by default for an aspx page in VS 2003. I guess I should try removing all tags in getresults.aspx, except for the page directive. May be that will solve my problem.

Anyways, I will try your suggestion tomorrow and see if it helps.
sun21170
0
sun21170
8/18/2005 1:41:46 AM

You should be able to remove all the HTML and not have it grow back... as long as you *never* open that page in Design View.

If Response.Write gives you trouble, add a single Literal to the HTML and put the content there instead.

If for whatever reason you can't do it this way, feel free to use my GetPiece function.  It's a bit wide (sorry about that), but it takes a string of content and gives you what's left between the two strings you specify (so feeding it myResult, "<head>", and "</body>" will give you the main content; then it's just up to you to do a few String.Replace operations).

Function GetPiece(ByVal strBody As String, ByVal strBegin As String, ByVal strEnd As String) As String
    Dim strResult As String = ""
    Try
        strResult = strBody.Substring((strBody.IndexOf(strBegin) + strBegin.Length), (strBody.IndexOf(strEnd) - (strBody.IndexOf(strBegin) + strBegin.Length)))
    Catch e As Exception
        If strResult = "" Then strResult = "error"
    Finally

    End Try
    Return strResult
End Function ' GetPiece


Peter Brunone
MS MVP, ASP.NET
Founder, EasyListBox.com
Do the impossible, and go home early.
0
PeterBrunone
8/18/2005 1:56:39 AM

I think that's a great function. I will definitely use that. Thanks.


sun21170
0
sun21170
8/18/2005 2:03:57 AM
It worked by removing all tags from the receiving page.
Thanks to everybody for their help.

sun21170
0
sun21170
8/21/2005 6:10:31 PM
Reply:

Similar Artilces:

>>>> Heads up <<<<
I just got a warning from Norton that "PamelaSetup-Basic.exe" has a virus in it. The name is "VirusBurst" Luckily, I did not install this software and Norton's quarantined it so I could delte it, which I have done. Symantec has not completed analysis of this particular piece of garbage but it did catch the sig. If you have installed Pamela, you may be in trouble. Duffy wrote: > I just got a warning from Norton that "PamelaSetup-Basic.exe" has a virus > in it. The name is "VirusBurst" > > Luckily, I did not install...

superreview granted: [Bug 57717] view-source on blank page shows <html><body></body></html> : [Attachment 150636] remove the generated stuff
Boris Zbarsky <bzbarsky@mit.edu> has granted Boris Zbarsky <bzbarsky@mit.edu>'s request for superreview: Bug 57717: view-source on blank page shows <html><body></body></html> http://bugzilla.mozilla.org/show_bug.cgi?id=57717 Attachment 150636: remove the generated stuff http://bugzilla.mozilla.org/attachment.cgi?id=150636&action=edit ------- Additional Comments from Boris Zbarsky <bzbarsky@mit.edu> Looks good. ...

>>>> ROOT Exploit in SAMBA <<<<<<
"A flaw has been detected in the Samba main smbd code which could allow an external attacker to remotely and anonymously gain Super User (root) privileges on a server running a Samba server. This flaw exists in previous versions of Samba from 2.0.x to 2.2.7a inclusive. This is a serious problem and all sites should either upgrade to Samba 2.2.8 immediately or prohibit access to TCP ports 139 and 445." http://us3.samba.org/samba/samba.html Binaries are available from Samba for RedHat, and some other distributions. So far as I can tell, the RedHat update mirrors I norm...

Header and Footer user control contains <html>, <body>, and <form> elements
Hi, I have read that we can not/should not use html, body and form tag in the user control. But here these tags are used in the Header control. I would like to know that although we can use those tags in User Control, should we use or should we follow other alternative for this. Thanks in advance, ---------------------------- http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpconcreatingpageletcontrol.asp ...

>>>> CAPITALS GAMES <<<<
.. ~~~!!!~~~ ================================================== ================================================== CLICK HERE TO ENTER: >>> http://web-paradise.cn/3/capitals-games <<< ================================================== ================================================== .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ....

C<< >> vs C<< >> vs C<< x >>
Ugh. So we all know that there's this syntax for formatting codes (n=E9s "interio= r sequences") like C<< x >>. And that tokenizes as three tokens: "C<< ", open-C code "x", content " >>" close-code matching the C open-code And this is explicated by what I wrote in perlpodspec where I say that such a code... * starts with a capital letter (just US-ASCII [A-Z]) followed by two or more "<"'s, one or more whitespace characters, * any number of characters * one or more whit...

>>>> BUY RAM <<<<
.. ~~~*@@@*~~~ ================================================== ================================================== ENTER HERE: >>> http://web-for-you.cn/about/buy-ram <<< ================================================== ================================================== .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ...

<html:html><html:body><html:input xul:persist="value">
I have an HTML administration and I'd like make it use of XUL's @persist great feature without andy browser-dependant conditions. Is there a way how to take advantage of @persist in normal HTML page viewed by Mozilla? Requirements: 1. HTML elements must not include namespace prefix - it must stay 'html', 'body' etc. 2. No extra XUL elements such as root xul:window etc. - still have fully IE compatible code. 3. Try to avoid Javascript magic that appends XUL xul:window element etc. Is possible something like this? --- <html xmlns="h...

>>>> BLU-RAY MOVIES <<<<
.. ~~~!!!~~~ ================================================== ================================================== CLICK HERE TO ENTER: >>> http://web-paradise.cn/2/blu-ray-movies <<< ================================================== ================================================== .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ....

[PATCH] correctly handle C<< >> and C<<< >>> in diagnostics
This is just a quick hack; ideally someone would make it use an actual pod parser. --- perl/lib/diagnostics.pm.orig 2003-12-30 15:48:47.000000000 -0800 +++ perl/lib/diagnostics.pm 2004-05-25 01:54:31.735904000 -0700 @@ -314,10 +314,10 @@ sub noop { return $_[0] } # spensive for a noop sub bold { my $str =$_[0]; $str =~ s/(.)/$1\b$1/g; return $str; } sub italic { my $str = $_[0]; $str =~ s/(.)/_\b$1/g; return $str; } - s/[BC]<(.*?)>/bold($1)/ges; + s/C<<< (.*?) >>>|C<< (.*?) >>|[BC]<(.*?)>/bold($+)/ges; ...

nsWebBrowserPersist gives me <window>, not <html>
Is there supposed to be a way to use the saveDocument method in nsIWebBrowserPersist, from JavaScipt, that can give one the page source? I have also seen this question prefaced with "I am a <insert word that rhymes with ruby>, but...", which is probably a good way to not get an answer. Well, I am not new. Do I understand everything I see in FF, or know where _all_ the documentation is? No. Do I have those dragon-shaped scars on my forehead that people outside Mountain View get as they bang their head against this stuff? Yes. I do. I am using the nsIWebBrowse...

The Man Who Debunks Virus Myths <<<hero???>>>R.Rosenberger Vmyths.com!!!<<<SIGH>>>
http://www.securitynewsportal.com/article.php?sid=1368&mode=thread&order=0 -- Regard: Joh@nnes´┐Ż 1216771 Ont.Inc. "Nothing is more damaging to a new truth than an old error" Take a look at the following from the article: (Begin quote) "Rosenberger is not just a random ornery writer with a website and a bone to pick. He's an experienced programmer, a systems administrator and a man of mystery with high-level CIA security clearance. Information about Rosenberger's status with the CIA was confirmed by an inquiry to a government office, and Ro...

How about <<< and >>> ops?
This has probably come up before, but I think it would be good if perl had two additional arithemetic operators: >>> would be a right shift _without_ sign extension under use integer (currently, under use integer, >> is at the mercy of the underlying C lib). <<< would be a left roll ($x <<< $y would be equivalent to ( ($x << $y) | (2**$y-1) & ($x >> (32-$y)) ) these two ops would come in handy when implementing cryptographic algorithms. I guess you could argue that >>> should be a right roll, but then I don'...

<<<THUD>>>
http://www.novell.com/support/search.do?cmd=displayKC&docType=kc&externalId=3882364&sliceId=1&docTypeID=DT_TID_1_1&dialogID=104807193&stateId=0%200%20104803654 Novell actually gave a projected release date for something? <ponders if this date is before or after the new maintenance policy kicks in> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 First, terrible subject line. Second, the offender has been shot. ;-) Not really.... this guy is awesome just because he is crazy like that. Third, before the patch date stuff (February I think). Goo...

Web resources about - How to strip a string of <html>, ,</html>, <body>, </body>, <form ... >, </form> tags? - asp.net.getting-started

Resources last updated: 12/13/2015 10:07:45 PM