Using the bash script from my previous post I can generate a list of all movie titles on tehconnection.eu and store them as lines in a text file called tehconnection.eu
. I wanted this to sift through a giant list of movies I had on an old external hard drive. This ruby script (save in sort_by_tehconnection.rb
) to check my directories list of movies against the list from tehconnection and move the files that exist in that list into a separate directory (which I'll then double check and delete).
Using the bash script from my previous post I can generate a list of all movie titles on tehconnection.eu and store them as lines in a text file called tehconnection.eu
. I wanted this to sift through a giant list of movies I had on an old external hard drive. This ruby script (save in sort_by_tehconnection.rb
) to check my directories list of movies against the list from tehconnection and move the files that exist in that list into a separate directory (which I'll then double check and delete).
#!/opt/local/bin/ruby
require "levenshtein_distance.rb"
require 'fileutils'
look_dir='/Volumes/Senorita/DOCUMENTARIES/Feature Documentaries/'
found_dir='/Volumes/Senorita/DOCUMENTARIES/Feature Documentaries/tehconnection/'
#files_array=File.readlines('files.txt');
files_array=Dir.entries(look_dir).reject{|entry| entry == "." || entry == ".." || entry == ".DS_Store"};
teh_array=File.readlines('tehconnection.txt').collect {|l| l.gsub(/[.!?:';"]/,"").downcase.sub(/^the /,"").chomp};
teh_array = teh_array.inject([]) do |res,e|
if e =~/([^\(]*) \(aka ([^\)]*)\)/
res.concat([$1,$2]);
else
res.concat([e]);
end
end
teh_map = [];
teh_array.each do |e|
i = e.length-1;
if teh_map[i].nil?
teh_map[i] = [];
end
teh_map[i] << e
end
acceptable_dist=2
min_dist=10000
min_teh=""
files_array.each do |orig_title|
title=orig_title.sub(/ *\([^\(]*\)/,"")
title.downcase!
title.sub!(/\.[^\.]*$/,"")
title.sub!(/^the /,"")
title.gsub!("\"","")
title.sub!(/^marx brothers - /,"")
len = title.length;
max_len = [len+acceptable_dist,teh_map.length-1].min;
min_len = [len-acceptable_dist,0].max;
puts "#{title}..."
(min_len..max_len).each do |l|
l_array = teh_map[l];
found = false;
if not l_array.nil?
l_array.each do |teh_title|
dist=levenshtein_distance(title,teh_title)
if dist<acceptable_dist
# found
puts " FOUND (#{orig_title}): #{teh_title} --> #{title}, #{dist}"
FileUtils.mv(look_dir+"/"+orig_title,found_dir+"/"+orig_title);
found = true;
break;
end
end
end
if found
break;
end
end
end