Tuesday, March 31, 2009

iDirt

One more natural Apple advertisement. In Patras as well...

Saturday, February 14, 2009

7-Zip vs WINRAR

WINRAR (current version: 3.80) was, and still is, popular. You can tell that by the number of people seeding the torrent of it's cracked version on The Pirate Bay (currently 2308 seeding versus 16 leeching). And people have got a reason to buy (or crack) such a program:
  • Very beautiful interface that can be themed.
  • High compression ratios.
  • Quite fast.
  • Supports multithreaded compression.
  • Supports archives with recovery records.
  • Portable version available.
  • Linux command line version available.
  • Available in 47 languages.
  • Supports creation of SFX RAR and ZIP archives
  • Full support for RAR and ZIP.
  • Extraction only support for the following 12 formats: CAB, ARJ, LZH, TAR, GZ, BZ2, ACE, UUE, JAR, ISO, 7Z, Z.
  • AES-128 encryption for RAR files.
  • Is programmed by a Russian, etc.
But WINRAR first of all is not free, neither in price nor free the way FSF means. It costs currently around $30.

On the other hand we have 7-Zip (current version: 4.65)
  • 7-Zip offers even better compression ratios.
  • 7-Zip is slower than WINRAR for some compression ratios but faster for others. (that's something I did not expect, keep on reading)
  • It is smaller in size.
  • Supports multithreaded compression.
  • There is also a portable version.
  • There is also Linux command line version.
  • It is available in 74 laguages. (Nice symmetry with WINRAR's 47 languages)
  • Supports creation of SFX archives in 7z format.
  • Offers full support of the following 5 formats: 7z, ZIP, GZIP, BZIP2 and TAR. The ZIP support offers higher compression ratios than that of WINRAR but no SFX support.
  • Offers extraction only support for the following 18 formats: ARJ, CAB, CHM, CPIO, DEB, DMG, HFS, ISO, LZH, LZMA, MSI, NSIS, RAR, RPM, UDF, WIM, XAR and Z.
  • AES-256 encryption both for 7z and ZIP files.
  • It is also programmed by a Russian ;-)
  • But, most importantly, 7-Zip is completely free and open source!
So what's the catch?

7-Zip used to be buggy and that's perhaps the reason why some people are afraid to use it. The latest versions are really stable. I haven't encountered any strange errors and I use 7-Zip a lot.

7-Zip lacks some features like the recovery records. But do people need them nowadays?
The recovery records are used to recover data from damaged RAR archives. You definitely need those if you store backups on unreliable media. But from my experience most people backup on externals HD drives which if they fail, they fail completely. The recovery records are only useful when bad sectors emerge under the RAR file. They are also completely useless when you transfer RAR files over a P2P network such as Bittorrent since most (if not all) such networks have error correction algorithms in their clients, or will simply redownload bad pieces. In any case you could use PHPar2 to create recovery records for any file, including 7-Zip archives.

7-Zip's interface is simpler than that of WINRAR as you can see:

WINRAR
7-Zip

7-Zip Archive Creation
This is the window you will see every time you add files to an archive. It contains every option you will ever need to use. In addition to these, 7-Zip supports lots of advanced options which you will probably never use but if you ever want to, then read the command line parameters from the help file and add them to the Parameters text field on the above window.

The most important thing in a compression program is the ratios it achieves and it's speed. I tested both programs, using their native format, on 4 different datasets using each of the 7 supported compression levels with solid archives on. Here are the results:

  1. Dataset 1: INSHAME folder. This is the directory where I keep the source code of my programs, the zipfiles that I distribute, some BMPs, and some other zipfiles. I usually backup this so I thought it would be nice for comparing the programs. I can't provide this dataset since it contains personal files.
    You can see the results by clicking here or take a look at this graph directly:

    As you can see WINRAR has the fastest compression while 7-Zip has the highest. WINRAR's Fastest and Fast compression beat 7-Zip's. But after Normal compression 7-Zip beats WINRAR's compression ratios for the same compression time.

  2. Dataset 2: Globulation 2 folder. Globulation 2 is an open source RTS game. This dataset is almost identical to the game folder after installing the game which available at their site but it is also available on demand (email me).
    You can see the results by clicking here or take a look at this graph directly:

    Although WINRAR has the fastest compression again, 7-Zip's Fastest and 7-Zip's Fast are better than what WINRAR would have accomplished in the same compression time (supposing that the time/size relation is linear.) Strangely, WINRAR's Good is faster than WINRAR's Normal and faster than what 7-Zip would have accomplished in the same compression time.

  3. Dataset 3: pup_save-Tritonio.2fs. I have installed Puppy Linux on my flash drive. This is the persistent storage file. I usually backup this in case it gets corrupted. This Dataset is available on demand (unless I have stored any passwords in there, in that case I will erase them before sending the file).
    You can see the results by clicking here or take a look at this graph directly:

    As always WINRAR offers the fastest possible compression. This time 7-Zip's Fastest offers better compression than what WINRAR would have accomplished in the same compression time. WINRAR's Fast, Normal and Good are better than what 7-Zip would have accomplished in the same compression time. WINRAR's Best is not as good as what 7-Zip would have offered in the same compression time. 7-Zip also offers the highest possible compression.

  4. Dataset 4: QEMU folder. This folders contains QEMU and QEMU Manager as it is available from their site. Since the dataset may contain some of my user settings, it is available on demand.
    You can see the results by clicking here or take a look at this graph directly:

    This time WINRAR is faster than what 7-Zip would have been in the same compression time. Still though 7-Zip offers the highest compression ratio if you are willing to spend some extra time.
I have heard that WINRAR is much faster than 7-Zip. But from what I saw that's not always the case. WINRAR's Fastest is always faster than 7-Zip's Fastest but has a much lower compression ratio. There are cases, though, where 7-Zip offers better ratios than what WINRAR would offer in the same time. Also in every case 7-Zip's Ultra offers the best compression ratio.

So if you don't actually care about the fancy interface and prefer an easier and simpler one and if you don't need the recovery records, why spend $30 on WINRAR when there is an open source alternative offering equally good compression rates and compression times and supports even more compression formats? Why not spend them on The Orange Box instead? :-)

Friday, January 30, 2009

iBroke

Apple is known for its innovative advertisements:

Tuesday, January 27, 2009

AskWise v1.1.0

I would like to test this version a bit more but it seems stable so I am releasing it. The new features, as found in the changelog are:

v1.1.0: The stiffness has been removed. Some minor improvements on the prediction algorithm. The database format has also been changed. To upgrade a database to the new format, just remove completely its second line. Batch prediction feature added*. Nano is now the default external editor for Linux.

*The batch prediction feature will help you get lot's of predictions at once by inputting TSV files with queries into AskWise.

Also in the process of writing v1.1 I might have fixed some bugs that might have or might have not existed in the v1.0. :-)

For more info about AskWise you might want to read all posts about it.

The new version can be downloaded from here.


PS: Here is a small Lua Quine I made: s=[[ print("s=\[\["..s.."\]\]"..s) ]] print("s=\[\["..s.."\]\]"..s)
Quines are programs that output their source code when run. You can find Quines in many languages here.

Friday, January 16, 2009

Email sanitizer-extractor in Lua

Yesterday a friend asked me to write a little script that reads a file and outputs every email it reads in another file, discarding any duplicates. After making the program he told me that he was searching the Internet and couldn't find something similar so I should upload it somewhere just in case someone else needs it.

To make it more interesting I managed to make it a single call program: everything is defined inside the arguments of a single call. In fact there are two calls: the first returns an object, a function of which is immediately called. Anyway here is the code:

io.open("output.txt","w"):write((string.gsub(" "..io.open(((arg or {})[1] or "input.txt")):read("*a").." ",".-([%w%.%-_]+@[%w%.%-_]+).-",function (email) email=string.lower(email) print("EMAIL: '"..email.."'") emails=emails or {} for index,emailseen in ipairs(emails) do if emailseen==email then return "" end end table.insert(emails,email) return email.."\n" end)))

Now let's break that up and put some comments, shall we?

io.open("output.txt","w"):write( --io.open opens a file in write mode and returns the file handle which instead of being stored in a variable is immediately used by calling it's filehandle:write function.

(string.gsub( --string.gsub will finally return two arguments: the email list, one per line, lowercase, and without duplicates and the number of replacements it did. The first argument is our output. I can discard the second by putting the function call, therefore the returned argument list, into parentheses. In Lua print((5,"kostas","klapatsimpala")) will just print 5.

" "..io.open(((arg or {})[1] or "input.txt")):read("*a").." " --This is the first argument to string.gsub. like we did before, we open a file in read mode and immediately use the returned handle to do a full read of the file. A tricky part is the "(arg or {})[1] or "input.txt"" part. If you call a lua script with extra arguments then the arg table will be created by Lua. If it exists then the "arg or {}" part will evaluate in "arg" (if on the left side of an "or" is a true value then "or" simply results in that) and then "(arg)[1]" will return the first variable which is a custom input filename. That filename "ORed" with "input.txt" will simply return that filename (since any strings are true values for Lua, so OR will evaluate in the left argument). If you didn't call the script with any arguments then the arg table will not exist, thus the "(arg or {})" part will result in a newly created empty table. Of course if you index it's first cell you'll find nothing, so the "({})[1] or "input.txt"" will result in "input.txt" (if "or" finds a false or nil value on it's left it will simply return the value on it's right). Finally I add two space characters: one to the start of the read data and one to the end. These are added so that the pattern matching I use will apply to any emails exactly at the beginning or exactly at the end of the read data.


,".-([%w%.%-_]+@[%w%.%-_]+).-" --The second argument is the pattern. I am breaking up the whole text in the following way: any number of any characters (as less as possible) followed by any number of email allowed characters (as much as possible), followed by @, followed by any number of email allowed characters (as much as possible), followed by any number of any characters (as less as possible). The "email allowed characters" are: alphanumerics, dot, dash, underscore). From this pattern I want to capture just the email part.

,function (email) --Now this is the best part. An anonymous function. It is created without being stored in a variable (which would give it a name) and immediately used as an argument to string.gsub. This function accepts a single argument: email. string.gsub will call it for every match with the capture (the email) as an argument.

email=string.lower(email) --First of all we turn the email to lowercase

print("EMAIL: '"..email.."'") --Debugging message...

emails=emails or {} --Remember what we said. If the left argument is true (not false and not nil) then it is returned, so if the emails variable has already been defined nothing will happen because emails=emails will be executed. If the emails variable is not defined (is nil) though, then "or" will return it's right argument therefore emails={} will be executed and emails will be initialized as an empty table.

for index,emailseen in ipairs(emails) do --For every already seen email do:

if emailseen==email then return "" end --If this already seen email is the same with the new capture then just return "" so that the whole match will be replaced by nothing. Remember that although the capture is just the email, the match includes the email as well as the preceding and the following characters.

end --end for.

table.insert(emails,email) --If we managed to get here then this email capture is seen for the first time. We insert it in the emails table.

return email.."\n" --and finally we return the email capture (lowercase) followed by a newline. This will replace the whole match.

end --end of the anonymous function

) --closing of string.gsub

) --that's the second parentheses for string.gsub (to discard the second returned argument)

) --closing of write.


That's all. I seem like it is working but I haven't done any extensive debugging.

Popular Posts