Apple is known for its innovative advertisements:
Friday, January 30, 2009
Tuesday, January 27, 2009
AskWise v1.1.0
I would like to test this version a bit more but it seems stable so I am releasing it. The new features, as found in the changelog are:
v1.1.0: The stiffness has been removed. Some minor improvements on the prediction algorithm. The database format has also been changed. To upgrade a database to the new format, just remove completely its second line. Batch prediction feature added*. Nano is now the default external editor for Linux.
*The batch prediction feature will help you get lot's of predictions at once by inputting TSV files with queries into AskWise.
Also in the process of writing v1.1 I might have fixed some bugs that might have or might have not existed in the v1.0. :-)
For more info about AskWise you might want to read all posts about it.
The new version can be downloaded from here.
PS: Here is a small Lua Quine I made: s=[[ print("s=\[\["..s.."\]\]"..s) ]] print("s=\[\["..s.."\]\]"..s)
Quines are programs that output their source code when run. You can find Quines in many languages here.
v1.1.0: The stiffness has been removed. Some minor improvements on the prediction algorithm. The database format has also been changed. To upgrade a database to the new format, just remove completely its second line. Batch prediction feature added*. Nano is now the default external editor for Linux.
*The batch prediction feature will help you get lot's of predictions at once by inputting TSV files with queries into AskWise.
Also in the process of writing v1.1 I might have fixed some bugs that might have or might have not existed in the v1.0. :-)
For more info about AskWise you might want to read all posts about it.
The new version can be downloaded from here.
PS: Here is a small Lua Quine I made: s=[[ print("s=\[\["..s.."\]\]"..s) ]] print("s=\[\["..s.."\]\]"..s)
Quines are programs that output their source code when run. You can find Quines in many languages here.
Categories:
My Progs,
My Progs: AskWise,
Portable Applications
Friday, January 16, 2009
Email sanitizer-extractor in Lua
Yesterday a friend asked me to write a little script that reads a file and outputs every email it reads in another file, discarding any duplicates. After making the program he told me that he was searching the Internet and couldn't find something similar so I should upload it somewhere just in case someone else needs it.
To make it more interesting I managed to make it a single call program: everything is defined inside the arguments of a single call. In fact there are two calls: the first returns an object, a function of which is immediately called. Anyway here is the code:
io.open("output.txt","w"):write((string.gsub(" "..io.open(((arg or {})[1] or "input.txt")):read("*a").." ",".-([%w%.%-_]+@[%w%.%-_]+).-",function (email) email=string.lower(email) print("EMAIL: '"..email.."'") emails=emails or {} for index,emailseen in ipairs(emails) do if emailseen==email then return "" end end table.insert(emails,email) return email.."\n" end)))
Now let's break that up and put some comments, shall we?
io.open("output.txt","w"):write( --io.open opens a file in write mode and returns the file handle which instead of being stored in a variable is immediately used by calling it's filehandle:write function.
(string.gsub( --string.gsub will finally return two arguments: the email list, one per line, lowercase, and without duplicates and the number of replacements it did. The first argument is our output. I can discard the second by putting the function call, therefore the returned argument list, into parentheses. In Lua print((5,"kostas","klapatsimpala")) will just print 5.
" "..io.open(((arg or {})[1] or "input.txt")):read("*a").." " --This is the first argument to string.gsub. like we did before, we open a file in read mode and immediately use the returned handle to do a full read of the file. A tricky part is the "(arg or {})[1] or "input.txt"" part. If you call a lua script with extra arguments then the arg table will be created by Lua. If it exists then the "arg or {}" part will evaluate in "arg" (if on the left side of an "or" is a true value then "or" simply results in that) and then "(arg)[1]" will return the first variable which is a custom input filename. That filename "ORed" with "input.txt" will simply return that filename (since any strings are true values for Lua, so OR will evaluate in the left argument). If you didn't call the script with any arguments then the arg table will not exist, thus the "(arg or {})" part will result in a newly created empty table. Of course if you index it's first cell you'll find nothing, so the "({})[1] or "input.txt"" will result in "input.txt" (if "or" finds a false or nil value on it's left it will simply return the value on it's right). Finally I add two space characters: one to the start of the read data and one to the end. These are added so that the pattern matching I use will apply to any emails exactly at the beginning or exactly at the end of the read data.
,".-([%w%.%-_]+@[%w%.%-_]+).-" --The second argument is the pattern. I am breaking up the whole text in the following way: any number of any characters (as less as possible) followed by any number of email allowed characters (as much as possible), followed by @, followed by any number of email allowed characters (as much as possible), followed by any number of any characters (as less as possible). The "email allowed characters" are: alphanumerics, dot, dash, underscore). From this pattern I want to capture just the email part.
,function (email) --Now this is the best part. An anonymous function. It is created without being stored in a variable (which would give it a name) and immediately used as an argument to string.gsub. This function accepts a single argument: email. string.gsub will call it for every match with the capture (the email) as an argument.
email=string.lower(email) --First of all we turn the email to lowercase
print("EMAIL: '"..email.."'") --Debugging message...
emails=emails or {} --Remember what we said. If the left argument is true (not false and not nil) then it is returned, so if the emails variable has already been defined nothing will happen because emails=emails will be executed. If the emails variable is not defined (is nil) though, then "or" will return it's right argument therefore emails={} will be executed and emails will be initialized as an empty table.
for index,emailseen in ipairs(emails) do --For every already seen email do:
if emailseen==email then return "" end --If this already seen email is the same with the new capture then just return "" so that the whole match will be replaced by nothing. Remember that although the capture is just the email, the match includes the email as well as the preceding and the following characters.
end --end for.
table.insert(emails,email) --If we managed to get here then this email capture is seen for the first time. We insert it in the emails table.
return email.."\n" --and finally we return the email capture (lowercase) followed by a newline. This will replace the whole match.
end --end of the anonymous function
) --closing of string.gsub
) --that's the second parentheses for string.gsub (to discard the second returned argument)
) --closing of write.
That's all. I seem like it is working but I haven't done any extensive debugging.
To make it more interesting I managed to make it a single call program: everything is defined inside the arguments of a single call. In fact there are two calls: the first returns an object, a function of which is immediately called. Anyway here is the code:
io.open("output.txt","w"):write((string.gsub(" "..io.open(((arg or {})[1] or "input.txt")):read("*a").." ",".-([%w%.%-_]+@[%w%.%-_]+).-",function (email) email=string.lower(email) print("EMAIL: '"..email.."'") emails=emails or {} for index,emailseen in ipairs(emails) do if emailseen==email then return "" end end table.insert(emails,email) return email.."\n" end)))
Now let's break that up and put some comments, shall we?
io.open("output.txt","w"):write( --io.open opens a file in write mode and returns the file handle which instead of being stored in a variable is immediately used by calling it's filehandle:write function.
(string.gsub( --string.gsub will finally return two arguments: the email list, one per line, lowercase, and without duplicates and the number of replacements it did. The first argument is our output. I can discard the second by putting the function call, therefore the returned argument list, into parentheses. In Lua print((5,"kostas","klapatsimpala")) will just print 5.
" "..io.open(((arg or {})[1] or "input.txt")):read("*a").." " --This is the first argument to string.gsub. like we did before, we open a file in read mode and immediately use the returned handle to do a full read of the file. A tricky part is the "(arg or {})[1] or "input.txt"" part. If you call a lua script with extra arguments then the arg table will be created by Lua. If it exists then the "arg or {}" part will evaluate in "arg" (if on the left side of an "or" is a true value then "or" simply results in that) and then "(arg)[1]" will return the first variable which is a custom input filename. That filename "ORed" with "input.txt" will simply return that filename (since any strings are true values for Lua, so OR will evaluate in the left argument). If you didn't call the script with any arguments then the arg table will not exist, thus the "(arg or {})" part will result in a newly created empty table. Of course if you index it's first cell you'll find nothing, so the "({})[1] or "input.txt"" will result in "input.txt" (if "or" finds a false or nil value on it's left it will simply return the value on it's right). Finally I add two space characters: one to the start of the read data and one to the end. These are added so that the pattern matching I use will apply to any emails exactly at the beginning or exactly at the end of the read data.
,".-([%w%.%-_]+@[%w%.%-_]+).-" --The second argument is the pattern. I am breaking up the whole text in the following way: any number of any characters (as less as possible) followed by any number of email allowed characters (as much as possible), followed by @, followed by any number of email allowed characters (as much as possible), followed by any number of any characters (as less as possible). The "email allowed characters" are: alphanumerics, dot, dash, underscore). From this pattern I want to capture just the email part.
,function (email) --Now this is the best part. An anonymous function. It is created without being stored in a variable (which would give it a name) and immediately used as an argument to string.gsub. This function accepts a single argument: email. string.gsub will call it for every match with the capture (the email) as an argument.
email=string.lower(email) --First of all we turn the email to lowercase
print("EMAIL: '"..email.."'") --Debugging message...
emails=emails or {} --Remember what we said. If the left argument is true (not false and not nil) then it is returned, so if the emails variable has already been defined nothing will happen because emails=emails will be executed. If the emails variable is not defined (is nil) though, then "or" will return it's right argument therefore emails={} will be executed and emails will be initialized as an empty table.
for index,emailseen in ipairs(emails) do --For every already seen email do:
if emailseen==email then return "" end --If this already seen email is the same with the new capture then just return "" so that the whole match will be replaced by nothing. Remember that although the capture is just the email, the match includes the email as well as the preceding and the following characters.
end --end for.
table.insert(emails,email) --If we managed to get here then this email capture is seen for the first time. We insert it in the emails table.
return email.."\n" --and finally we return the email capture (lowercase) followed by a newline. This will replace the whole match.
end --end of the anonymous function
) --closing of string.gsub
) --that's the second parentheses for string.gsub (to discard the second returned argument)
) --closing of write.
That's all. I seem like it is working but I haven't done any extensive debugging.
Categories:
My Progs,
Programming
Subscribe to:
Posts (Atom)
Popular Posts
-
This article probably also applies to: VGN-FZ38M Yeap I bought it. I though that since it had an NVidia card and it was Centrino based etc i...
-
The other day, while browsing Wikipedia, I wondered how hard it must be to have weird fetishes without being able to share them with someone...
-
It seems that the most popular article on this blog is " How to make portable applications " written back in 2007. Reading it aga...