[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[no subject]
- <!--x-content-type: text/plain --> "http://www.w3.org/TR/html4/loose.dtd">
- <!--x-date: Fri, 8 Jul 2005 15:15:43 -0400 -->
- <!--x-from-r13: wbaxrggraubsra ng lnubb.pbz (X b a Y r g g r a u b s r a) -->
- <!--x-message-id: p06210223bef47a9b2380@[192.168.8.110] -->
- <!--x-subject: [ale] Parsing CSV file in perl -->
- <li><em>date</em>: Fri, 8 Jul 2005 15:15:43 -0400</li>
- <li><em>from</em>: jonkettenhofen at yahoo.com (J o n K e t t e n h o f e n)</li>
- <li><em>subject</em>: [ale] Parsing CSV file in perl</li>
#!/usr/bin/perl -w
open TXT, "< datafile" or die "Can't open datafile : $!";
while(<TXT>) {
if ( m/^[ ]*\d+[ ]+/ ) {
print "$_\n" or die "print failed: $!";
} elsif ( m/^[ ]*"ID"[ ]+/ ) {
# do nothing - don't print this line
} else {
print ">>missing ID on this line: $_";
}
}
Note: the above code includes an invisible tab in each of the [ ]+ character
classes. There were no tabs in the email you posted, so I chose to compensate
in the code. I did not understand your regex - see comment below.
For the "else" you could choose to not print the line or ?
If you want to actually parse the line to test other fields for being empty,
I've seen some scripts online
that use ParseWords. I could not find any reference to a perl command that
assigns tokens into positional parameters (ala $1 $2 $3 etc.) in the way
that set does in the Korn shell.
>I'm trying to parse a CSV file in perl and I'm having a issue with some
>of the columns being blank.
>
>Here is a sample piece of data.
>
>Id LASTNAME FIRSTNAME
> Adams Portia
>10572 Alexander Robert
>
>You can see that the first row does not have an ID. This can be true
>for all columns. They may or may not have values.
>
>Here is how I'm trying the parse:
>
>open TXT, "< Expanded_2005_Select_1.csv";
>while(<TXT>) {
> m/^(\d+?)\t/;
OK, in this regex, I don't believe the parens are necessary (is this
Perl 5 or 6?)
and the \d+? is not as clear as \d*. Did you mean to put "(\d+)?" ?
Not sure what you were thinking here.
> print "$1\n";
Perl on my Mac OS X barfed on the print statement.
>}
>
>Each columns is tab delimeted. When I run this I get the lastname in $1
>for the first line and the the ID in $1 for the second line. I need to
>somehow create a regex that would be unforgiving of nothing being there.
>
>Data file looks like this:
> 1 "ID" "LASTNAME" "FIRSTNAME" "TITLE" "COMPANY"
>"ADDRESS " "ADDRESS2" "CITY" "STATE" "ZIPCODE"
>"COUNTRY" "PHONE" "EMAIL" "REGTYPE" "DATE" "TIME"
>"Question1" "Questio n2" "Question3" "READERID"
> 2 "Adams" "Portia" "Director" "The Rockefeller
>Univers ity" "1230 York Ave " "New York"
>"NY" "10021-6 399" "USA" 2123277719
>"adams at rockefeller.edu" "Member"
> 3 10572 "Alexander" "Robert" "Manager Voice & Video
>Solution" "Air Products and Chemicals, Inc" "7201
>Hamilton Blvd" "Allentown" "PA" "18195-1501"
>"USA" "610-481-7156" "alexanrw at airproducts.com" "Member"
>06/12/2005 06:06:14 pm 60711
>
>The 1,2,3 that you see is the line numbers in VI
>
>
>_______________________________________________
>Ale mailing list
>Ale at ale.org
><a rel="nofollow" href="http://www.ale.org/mailman/listinfo/ale">http://www.ale.org/mailman/listinfo/ale</a>
_______________________________________________
Ale mailing list
Ale at ale.org
<a rel="nofollow" href="http://www.ale.org/mailman/listinfo/ale">http://www.ale.org/mailman/listinfo/ale</a>
</pre>
<!--X-Body-of-Message-End-->
<!--X-MsgBody-End-->
<!--X-Follow-Ups-->
<hr>
<!--X-Follow-Ups-End-->
<!--X-References-->
<!--X-References-End-->
<!--X-BotPNI-->
<ul>
<li>Prev by Date:
<strong><a href="msg00109.html">[ale] compiler error re:Open Office</a></strong>
</li>
<li>Next by Date:
<strong><a href="msg00106.html">[ale] Disabling Cache?</a></strong>
</li>
<li>Previous by thread:
<strong><a href="msg00095.html">[ale] Parsing CSV file in perl</a></strong>
</li>
<li>Next by thread:
<strong><a href="msg00079.html">[ale] link to an excellent review of SUSE 9.3 Professional</a></strong>
</li>
<li>Index(es):
<ul>
<li><a href="maillist.html#00097"><strong>Date</strong></a></li>
<li><a href="threads.html#00097"><strong>Thread</strong></a></li>
</ul>
</li>
</ul>
<!--X-BotPNI-End-->
<!--X-User-Footer-->
<!--X-User-Footer-End-->
</body>
</html>