[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[no subject]



:=I'm trying to get <a  rel="nofollow" href="http://addr:port/page";>http://addr:port/page</a> 
:=
:=from:
:=
:=GET <a  rel="nofollow" href="http://www.google.com/";>http://www.google.com/</a> HTTP/1.1
:=
:=this sucks as it is too greedy.  Anyone have a suggestion.
:=$m =~ m/http:\/\/(.+)\/\s+/;

docx&gt; cat foo.pl
#!/usr/bin/perl -w

$url[0] = '<a  rel="nofollow" href="http://www.google.com:80/gmail";>http://www.google.com:80/gmail</a> HTTP/1.1';
$url[1] = '<a  rel="nofollow" href="http://www.google.com/gmail";>http://www.google.com/gmail</a> HTTP/1.1';
$url[2] = '<a  rel="nofollow" href="http://www.google.com/";>http://www.google.com/</a> HTTP/1.1';
$url[3] = '<a  rel="nofollow" href="http://www.google.com/";>http://www.google.com/</a>';
$url[4] = '<a  rel="nofollow" href="http://www.google.com:80/";>http://www.google.com:80/</a>';

foreach $url (@url) {
        $host_port = ''; $page = ''; $protocol = '';
        ($host_port, $page, $protocol) = $url =~ m#http://(.*?)/([^\s]*)\s*(.*)#;
        $host = $host_port; $port = '';
        ($host, $port) = split /:/, $host_port if $host_port =~ /:/;

        print &quot;host: $host\nport: $port\npage: $page\nprotocol: $protocol\n--\n&quot;;
}
docx&gt; ./foo.pl
host: www.google.com
port: 80
page: gmail
protocol: HTTP/1.1
--
host: www.google.com
port:
page: gmail
protocol: HTTP/1.1
--
host: www.google.com
port:
page:
protocol: HTTP/1.1
--
host: www.google.com
port:
page:
protocol:
--
host: www.google.com
port: 80
page:
protocol:
--



-- 
Dylan Northrup - docx at io.com - <a  rel="nofollow" href="http://www.io.com/~docx/";>http://www.io.com/~docx/</a>
&quot;Harder to work, harder to strive, hard to be glad to be alive, but it's 
 really worth it if you give it a try.&quot; -- Cowboy Mouth, 'Easy'


</pre>
<!--X-Body-of-Message-End-->
<!--X-MsgBody-End-->
<!--X-Follow-Ups-->
<hr>
<!--X-Follow-Ups-End-->
<!--X-References-->
<ul><li><strong>References</strong>:
<ul>
<li><strong><a name="00171" href="msg00171.html">[ale] Extraction of address and pages</a></strong>
<ul><li><em>From:</em> cfowler at outpostsentinel.com (Christopher Fowler)</li></ul></li>
</ul></li></ul>
<!--X-References-End-->
<!--X-BotPNI-->
<ul>
<li>Prev by Date:
<strong><a href="msg00186.html">[ale] Its over. Maybe</a></strong>
</li>
<li>Next by Date:
<strong><a href="msg00188.html">[ale] Its over. Maybe</a></strong>
</li>
<li>Previous by thread:
<strong><a href="msg00176.html">[ale] Extraction of address and pages</a></strong>
</li>
<li>Next by thread:
<strong><a href="msg00190.html">[ale] FIrefox question</a></strong>
</li>
<li>Index(es):
<ul>
<li><a href="maillist.html#00187"><strong>Date</strong></a></li>
<li><a href="threads.html#00187"><strong>Thread</strong></a></li>
</ul>
</li>
</ul>

<!--X-BotPNI-End-->
<!--X-User-Footer-->
<!--X-User-Footer-End-->
</body>
</html>