Wednesday, June 23, 2010

PERL writing to a file at both ends

Writing to a file in PERL 5.0 (PERL 6 has few changes) is easy and has many options which are specified while opening the file. Here is a list of different ways to write to a file and the change they bring about in the file.
-------------------------------------------------------------------
#!/usr/local/bin/perl
open(FILE,"+>>outputfile.txt");
print FILE "hello\n";
-------------------------------------------------------------------
Output:(appended at end of file)
-------------------------------------------------------------------
previous
text
hello
-------------------------------------------------------------------
#!/usr/local/bin/perl
open(FILE,"+>outputfile.txt");
print FILE "hello\n";
-------------------------------------------------------------------
Output:(note that previous text is erased and new file is created)
-------------------------------------------------------------------
hello
-------------------------------------------------------------------
#!/usr/local/bin/perl
open(FILE,"+<outputfile.txt");
print FILE "hello\n";
-------------------------------------------------------------------
Output:(note that previous text is replaced starting at the point where the file pointer is located at the time of printing)
-------------------------------------------------------------------
hello
us
text
-------------------------------------------------------------------

But unfortunately if we want to append to the beginning of the file we need to copy the previous text and write it all back in after writing the new text. In such cases using few of the perl modules can solve the problem. In principle its possible that few programs can be writing to the beginning and end of a file at the (nearly)same time.

Thursday, June 17, 2010

Unix tail bug?

The UNIX tail command is probably one of the most widely used Unix commands. There seems to be a "bug" if you want to call it that in its functionality.

Try running a tail -5 on a directory where the files have data being written into them continuously like log files. Something like "tail -5 *" which should get you just the last 5 lines from each of the files in the directory.

However, try running it a few times and you will be surprised to see that for few of the files more than 5 lines are displayed!!

If you are unlucky enough to use this in a shell script that required just 5 lines and no more, the script will keep failing few times but seem to work just fine at other times. A transient bug which can slide pass the best testing.

This is something that happens only when you use wild cards to specify the files. So presumably the error happens because the command fails to realize its already read a particular file. Using other non-standard utilities like since or multitail might solve the problem.The programmer could even become intelligent and read just the top 5 lines of the output of tail!! for each file.

More importantly is this a bug with Unix tail or with the wildcards or just a undesired side effect we have to live with? I could see this "bug" in my 9.8. Not sure if this thing is endemic to this flavor. When i tried the same in the GNU core utilities and opensuse, i have to use the correct version "tail -n 5 *" and this works fine. Its always better to have the GNU core utilities than using the standard utilities that come with some of the less frequently updated flavors .

Tuesday, June 8, 2010

Ingenious ways of hiding e-mail from spam bots

By putting up your e-mail address on your website or blog is an open invitation to spam bots to harvest your email and start spamming your inbox or congesting your spambox. Congesting of spambox is a big problem if the the spam protector puts genuine mail in the spam folder. Having to sort through the spam can be a nightmare if its full.

We have seen the obvious ways such as using [at] instead of @ for protecting? which unfortunatley is not good enough as spam bots evolve too. Having a image showing the mail id is mostly safe, but makes it impossible for the user to copy and paste it. Services such as reCAPTCHA have made it possible to protect you mail id with an image.However, for the lazy user this may be too much work! to do, just to see the mail id.

Few ingenious ways to hide a mail id in plain site:
  1. Remove the "meaning of the word that will confuse the spam bot" from the mymailid@confusespambot.actauldomain.com
  2. Use the popular version of the domain in mymailid@geemail.com
  3. Do the math in mymailid@three-two23.com
  4. No numbers in my1mail2id@356domain.com
If this is not enough to confuse the spam bots, we could always use other methods such as having a serverside contact form which hides the mail id or use a client side script to render the mail id.