[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[ale] Another Large File/PERL/Awk/Sed question...
With a really gigantic file, in the special case when you are truly
only replacing a byte (i.e. the beginning state of the file is the
same size as the end state), you can use seek() magic to make this a
lot quicker. On a test with a 500 MB file, this takes 0.003 sec
elapsed time versus 14.2 sec for the sed -i approach. Of course, this
*only* works if the file size stays the same: otherwise your file will
be irrevocably corrupted.
#!/usr/bin/perl -w
use strict;
open(FILE, "+<test.file") or die;
my $firstLine=<FILE>;
$firstLine =~ s/column4;column4/column4;column5/; # if you change the
number of bytes, data is corrupted.
seek(FILE,0,0);
print(FILE $firstLine);
close(FILE);
On Tue, Dec 1, 2009 at 4:59 PM, JK <jknapka at kneuro.net> wrote:
> Richard Bronosky wrote:
>> in sed, use '[range/line number]{commands}' to limit where the edits
>> are made. Example:
>> mount |sed '1{s/type/TEST/}'
>>
>> what you want is:
>> sed -i.bak '1{s/column4/column5/}' filename
>
> Interesting...
>
> For single commands, address-space-command also works, so:
>
> ? sed -i -e '1 s/column4;column4/column4;column5/'
>
> would work as well as the {} version. ?My previous attempt
> was wrong though, in that 0 is not a valid address in at
> least some versions of sed. In fact, the man page for 4.2.1
> says 0 IS valid, and means "really for sure start matching
> at the very first line" (there are some circumstances where
> 1 will NOT match, to wit, if line 1 matches the regexp
> specified as the ending address); but 4.2.1 does not actually
> accept that syntax.
>
> -- JK
>
>> A backup filename.bak will be created with that command. drop the
>> -i.bak if you don't want it.
>>
>>
>> On Tue, Dec 1, 2009 at 4:06 PM, Bob Kruger <bkruger at mindspring.com> wrote:
>>> All;
>>>
>>> Thanks to all who assisted me with my earlier question on deleting the semicolon from the end of a line. ?I have another one that may be a bit stickier.
>>>
>>> Again I have a large data file in text format, this one is 3.2GB. ?Same as before, the field are semicolon delimited. ?The first line of the file is the column name. ?However, I have two columns that were inadvertently given the same column name.
>>>
>>> Example:
>>>
>>> column1;column2;column3;column4;column4;column6;column7....
>>>
>>> I would like to change the second instance of column4 to column5 on the first line of the file. ?I thought it would be simple to fire up vi and just do a simple text edit. ?The edit part was simple, but the saving of the file is taking hours.
>>>
>>> Any thoughts or ideas using PERL, Awk, or Sed?
>>>
>>> Thanks in advance for any assistance.
>>>
>>> V/r
>>>
>>> Bob
--
Bj?rn Gustafsson