[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[no subject]
- <!--x-content-type: text/plain -->
- <!--x-date: Thu Nov 4 18:15:58 2004 -->
- <!--x-from-r13: qbpk ng vb.pbz (Rlyna @beguehc) -->
- <!--x-message-id: Pine.LNX.4.44.0411041655570.7733-[email protected] -->
- <!--x-reference: [email protected] --> "http://www.w3.org/TR/html4/loose.dtd">
- <!--x-subject: [ale] Extraction of address and pages -->
- <li><em>date</em>: Thu Nov 4 18:15:58 2004</li>
- <li><em>from</em>: docx at io.com (Dylan Northrup)</li>
- <li><em>in-reply-to</em>: <<a href="msg00171.html">[email protected]</a>></li>
- <li><em>subject</em>: [ale] Extraction of address and pages</li>
:=I'm trying to get <a rel="nofollow" href="http://addr:port/page">http://addr:port/page</a>
:=
:=from:
:=
:=GET <a rel="nofollow" href="http://www.google.com/">http://www.google.com/</a> HTTP/1.1
:=
:=this sucks as it is too greedy. Anyone have a suggestion.
:=$m =~ m/http:\/\/(.+)\/\s+/;
docx> cat foo.pl
#!/usr/bin/perl -w
$url[0] = '<a rel="nofollow" href="http://www.google.com:80/gmail">http://www.google.com:80/gmail</a> HTTP/1.1';
$url[1] = '<a rel="nofollow" href="http://www.google.com/gmail">http://www.google.com/gmail</a> HTTP/1.1';
$url[2] = '<a rel="nofollow" href="http://www.google.com/">http://www.google.com/</a> HTTP/1.1';
$url[3] = '<a rel="nofollow" href="http://www.google.com/">http://www.google.com/</a>';
$url[4] = '<a rel="nofollow" href="http://www.google.com:80/">http://www.google.com:80/</a>';
foreach $url (@url) {
$host_port = ''; $page = ''; $protocol = '';
($host_port, $page, $protocol) = $url =~ m#http://(.*?)/([^\s]*)\s*(.*)#;
$host = $host_port; $port = '';
($host, $port) = split /:/, $host_port if $host_port =~ /:/;
print "host: $host\nport: $port\npage: $page\nprotocol: $protocol\n--\n";
}
docx> ./foo.pl
host: www.google.com
port: 80
page: gmail
protocol: HTTP/1.1
--
host: www.google.com
port:
page: gmail
protocol: HTTP/1.1
--
host: www.google.com
port:
page:
protocol: HTTP/1.1
--
host: www.google.com
port:
page:
protocol:
--
host: www.google.com
port: 80
page:
protocol:
--
--
Dylan Northrup - docx at io.com - <a rel="nofollow" href="http://www.io.com/~docx/">http://www.io.com/~docx/</a>
"Harder to work, harder to strive, hard to be glad to be alive, but it's
really worth it if you give it a try." -- Cowboy Mouth, 'Easy'
</pre>
<!--X-Body-of-Message-End-->
<!--X-MsgBody-End-->
<!--X-Follow-Ups-->
<hr>
<!--X-Follow-Ups-End-->
<!--X-References-->
<ul><li><strong>References</strong>:
<ul>
<li><strong><a name="00171" href="msg00171.html">[ale] Extraction of address and pages</a></strong>
<ul><li><em>From:</em> cfowler at outpostsentinel.com (Christopher Fowler)</li></ul></li>
</ul></li></ul>
<!--X-References-End-->
<!--X-BotPNI-->
<ul>
<li>Prev by Date:
<strong><a href="msg00186.html">[ale] Its over. Maybe</a></strong>
</li>
<li>Next by Date:
<strong><a href="msg00188.html">[ale] Its over. Maybe</a></strong>
</li>
<li>Previous by thread:
<strong><a href="msg00176.html">[ale] Extraction of address and pages</a></strong>
</li>
<li>Next by thread:
<strong><a href="msg00190.html">[ale] FIrefox question</a></strong>
</li>
<li>Index(es):
<ul>
<li><a href="maillist.html#00187"><strong>Date</strong></a></li>
<li><a href="threads.html#00187"><strong>Thread</strong></a></li>
</ul>
</li>
</ul>
<!--X-BotPNI-End-->
<!--X-User-Footer-->
<!--X-User-Footer-End-->
</body>
</html>