Friday, July 2, 2021

Aperito v1.2

Aperito has now reached v1.2 and has a two new commands: "nonstop" and "dostop" to respectively disable or enable halting on errors while moving files. It is faster when processing smaller files and it will now preserve the modification date and the permissions of all parent directories that it creates while replicating the duplicate files hierarchy.

  The latest version can be downloaded from here, and the signature from here.

Sunday, June 27, 2021

Aperito v1.1

I decided to go ahead and cross out a few items from the to-do list in my previous post.

Aperito has now reached v1.1 and has a dry-run command. It will also always check your whole script before it starts executing it, so you won't see that many cases where you wait hours for the first part of the script to finish before you get an error about a missing argument later in the script. Finally I made a few cosmetic improvements in the "ask" command.

  The latest version can be downloaded from here, and the signature from here.

Saturday, June 19, 2021

Aperito - Duplicate File Manager

 I often have to tidy up files with lots of duplicates and I've tried quite a few duplicate finder programs but I always found them lacking in the features that I need. So I made my own and called it Aperito which means "plain, austere" as in "something that doesn't have things you don't need" in Greek.
   
Aperito is a somewhat scriptable duplicate file manager. Cleaning up duplicate files from a directory is as simple as running:

    aperito scancleanup MyDirectory

But Aperito allows you to perform much more complex deduplication. For example, if you want to delete all files under Dir3 that also exist under Dir1 or Dir2 but not touch any files under Dir1 and Dir2 nor deduplicate any files which show up multiple times within Dir3, you could run this:

    aperito scan Dir1 scan Dir2 cleanup Dir3
   
If you wanted to do the same but also deduplicate the files that show up more than once inside Dir3, you slightly change the command:

    aperito scan Dir1 scan Dir2 scancleanup Dir3
   
Or, let assume Dir1 is actually an external drive that you don't keep mounted all the time. In that case you could scan Dir1 when it's mounted with:

    aperito scan Dir1 save dir1-files.asd
   
which would create a file that can later be used like this:

    aperito load dir1-files.asd scan Dir2 scancleanup Dir3
   
Aperito will try to parallelize operations to some extent if it thinks that the results will be predictable. For example, in the above command, it will load the savefile while scanning Dir2 at the same time.

Aperito will never delete any duplicate files, instead it will create a new directory and move them there. For example if you run:

    aperito scancleanup Dir4
   
any duplicate files like Dir4/subdir/filename.ext will be moved to: Dir4-Aperito-duplicates/subdir/filename.ext. That way, if you want to revert the deduplication you can simply move the contents of Dir4 into Dir4-Aperito-duplicates and let your OS handle the merges. Finally delete the now empty Dir4 and rename Dir4-Aperito-duplicates to Dir4. You could also merge the contents in reverse (move the contents of Dir4-Aperito-duplicates into Dir4 and then delete Dir4-Aperito-duplicates) which is simpler but it may affect the permissions of the directories as the directories created under Dir4-Aperito-duplicates do not necessarily have the same permissions as the original directories under Dir4.

Aperito starts up with an empty internal state, assuming that no files have been seen and starts reading commands from its command line in sequence. These commands may add files into Aperito's internal state as "seen" or they may deduplicate (move away to a separate directory, as described above) files that have been "seen" more than one time.

The available commands are:

    scan "directory"            Scans a directory tree and adds all the files in it to the internal state of Aperito as "seen". It will not deduplicate anything though.
                               
    cleanup "directory"         Scans a directory tree and deduplicates all the files that have already been seen. It will not add the scanned files into the internal state as "seen" though, therefore, if a file shows up twice in this directory tree, it will not be deduplicated. To be deduplicated a file under this tree needs to be have been "seen" before the cleanup command was run.
                               
    scancleanup "directory"     Like the cleanup command but this time it will not only deduplicate "seen" files, but it will also add all the files into the internal state of Aperito as "seen". Therefore if a file shows up twice (or more times) under this directory tree, it will be deduplicated. Of course, files "seen" before this command is run will be also be deduplicated even the first time they are encountered within this directory.
                               
    save "savefile.asd"         Saves the internal state of Aperito to a file so that you can load it some other time. Useful for scanning external drives once and then being able to deduplicate files from other drives as if the "saved" drive was present. Can also be used to speed up scanning of directories that you know to be unchanged.
                               
    load "savefile.asd"         Loads a saved state. The saved state is merged with the current internal state of Aperito so you can write this command multiple times to load multiple files.
                               
    reset                       Resets the internal state of Aperito. All "seen" files will be forgotten after this command.
                               
    keep shallow/deep                These two commands affect the behavior of any scancleanup commands that follow. "Keep shallow" will cause scancleanup to keep the file which is closest to the root when one or more duplicates are found while "keep deep" does the opposite and keeps the most deeply nested file (this is the default behavior).
                               
    ask                         Similar to the previous two commands but this time it will make Aperito ask you which file you want to keep. You will also be given the choice to select any parent directory of each file so that all files under that directory will be kept. If you select two directories so that all files under them will be kept, and then a duplicate file which exists under both of them is found, it will be kept in both directories.
                               
    wait                        Waits for all previous command to finish before proceeding to the next command(s) even if they could be run in parallel.
                               
    threads n                   Number of threads that will be used to hash the contents of files per command that runs in parallel. By default n=2. Affects commands after it only.
                               
    compare "savefile.asd"      Compare the currently "seen" files with the hashes stored in the given saved state file. It will print out the hashes (and one location for each hash) that exist only on one of the two. Useful for checking if two locations have the same data, without comparing the actual directory tree structure.
                               
    exclude "regex"             Exclude files whose path and filename contain a substring that matches the given regular expression. Matching files will not be scanned at all. This command affects any commands that follow it. Loading a saved state is not affected by exclusions.
                               
    noexclude                   If you have used the exclude command, noexclude can be used to remove all exclusions for all the commands that follow it.
                               
    and                         Not exactly a command by itself but can be used right after the directory paths of scan, cleanup and scancleanup to instruct those commands to modify multiple paths as if they were one. The difference between using "and" and simply using the command twice, once for each directory, for the scan and cleanup is a minor one: When using "and" the number of threads will be used to scan these directories as if they were a single directory, while using the commands multiple times will allow Aperito to run the multiple scan or cleanup commands in parallel, multiplying the number of threads used. On the other hand, the effect on the scancleanup command is more pronounced: Using "and" instead of two scancleanup commands will cause any files that are duplicated in these two directories to be deduplicated properly according to the rules (deepest, shallowest or by asking the user), while using two scancleanup commands (one for each directory) will cause files that exist in both directories to be deduplicated-away from the second directory even if, for example, you have elected to keep the deepest duplicate and the duplicate in the second directory is the deepest. The reason for this behavior is that scancleanup commands do not run in parallel and they behave like a regular clean command with regards to files seen by previous commands (so files seen by the first scancleanup command will be always removed if seen by following scancleanup commands, regardless of rules).
                               
                               
Remember that the internal state (which files have been "seen") is not preserved between runs unless you run the save command and then load it with the load command.

Commands that can be run in parallel if they appear sequentially are:

 * Save(s), cleanup(s) and diff(s).
 * Load(s) and scan(s)

Reset, wait, threads and scancleanup are always run atomically. Keep and ask will wait for any pending scancleanup to finish before being run.

If you have 3 scan commands one after the other, and the default number of threads (i.e. 2) that will give you 2*3=6 threads processing file contents in parallel. If all three directories you are scanning are in the same disk and if the disk is rotational and not an SSD this may cause more overhead due to seek time so you should consider either reducing threads per scan (threads 1) or putting wait commands between the scan commands.

When Aperito explains why it's moving a file to the duplicates directory, the second path may start with [?] which means that this is a path loaded from a saved state with the load command and therefore may not currently exist or, if it is a relative path, may not be relative to the current working directory.

Aperito is written in Go (my first program in that language) and is freeware for now but I'll think about opening the source code later. I'd like to see it included in Debian's repos one day but until this becomes realistic I'll probably stick with simply freeware. This is still the first version after all, and I have more features planned.

You can download Aperito from here. The zipfile contains binaries for Linux (32/64bits and 32/64bit ARM for Raspberry etc), Windows (32/64bits) and Mac (ARM/AMD). You can download the PGP signature for the zipfile from here. My key should be on the sidebar.

Please take care while using Aperito. Do not run commands that others give you unless you understand them.

And before I go, here are some things that I am thinking of adding in the future:

  • Dry-run command. Not super necessary since Aperito doesn't delete files anyhow so to revert whatever it does you just merge directories again. Still, good to have.
  • Check the full script before starting to run it. Right now a mistake in a command won't be discovered until the command is reached.
  •  "Forget" command to selectively remove files that match a regular expression from the "seen" memory.
  • "Include" command. I think you can already emulate an include command with a properly crafted "Exclude" regular expression but it may be worth having an actual easier to use include command.

Wednesday, December 2, 2020

Colors I see you

colours.icu ("Colors I see you") is a website I quickly made to test out an idea for an animated color blindness correction filter.

Normal color blindness correction filters will simply transform the color space of an image to make it possible for color blind people to distinguish between some of the colors that they normally can't. That also means though that in the "corrected" version they are unable to distinguish between colors that they normally could and also all the colors look completely wrong.

So the idea I had was to use animation to be able to present to the person both the base image in it's correct colors, while periodically flashing a second version of the image that causes areas to become brighter or darker depending on what color they really are. For example red details flash bright while green details flash dark.

For example take this image that someone posted on reddit:


It's a tree that hasn't become fully red during autumn yet because part of it was illuminated by that lamp. To someone with protanopia this looks like this: (if I implemented my filters right)


The distinction between red and green is gone. So what the website gives you is the following image:

 

 

The animation on the website is smoother, not a bright flash like in this GIF file, so if you want to see it properly just go to colors.icu and upload something yourself.

You can see that the red parts of the image are flashing brighter while the green ones and staying dark or becoming even darker.

I am still not very confident that I did all the math right and I plan to try a slightly different way to do the correction, especially for deuteranopia.

The site can also do correction for the anomalous forms of color blindness (when people are trichromats but with slightly wrong detection for one of the colors) but because there's nothing super scientific about the website, just try all the correction modes and find the one that works best for you.

Tuesday, December 1, 2020

Autozeep v6

Version 6 adds the ability to revert all files to their original compression setting. The detailed stats generation debugging feature is removed to avoid a potential bug with big INI files. It contains a richer list of excluded files. It will also show you the total savings after every run, both in the popup after the run and inside the compressed_files.txt file. Finally I changed the icon and the license.

Oh and, as always, a bunch of antiviruses detect it as a virus like they do for everything written in AutoIt3. This has actually caused me grief lately as a company called Netcraft keep sending takedown requests to my host to remove a bunch of my software from my server. That's software that I've been hosting for years and now suddenly their scanners are picking it up and I keep getting 24-hour notices to take down stuff. Thankfully Netcraft is most of the time quick to reply to my false-positive claims and they take down their takedown requests. But it's annoying that I have to keep doing this just because some antiviruses that they decided to use throw false positives for AutoIt3 scripts.

You can download Autozeep v6 from here.

Popular Posts