Exporting data from Anime-Planet

GoBusto
NOTE: I was originally going to post this on my blog but then I realised that I dont actually have one so Ive posted it here in the hopes that someone else might find the story of my travails useful. Larry Wall once said that there are three attributes of a great programmer: 1. Laziness 2. Hubris 3. Something else which I didnt look up because of point 1. As a lazy great programmer I am paradoxically willing to go to extreme lengths to avoid any form of unnecessary work. Sometimes these lengths actually involve more effort than would have been expended in actually doing the thing I am avoiding but this is who I am dont you try to change me dad. Several years ago I signed up for AnimePlanethttp://www.animeplanet.com/ so that I could keep track of the Japanese Schoolgirl Robot Cartoons I waste my mortality on. Eventually though I decided to move on due to a confluence of the activity feeds being broken under maintenance video advertisements for the Warcraft movie fading in over the actual page content and the infamous ratings shamerhttp://www.animeplanet.com/inc//surprise.jpg image used to shame users me who didnt leave ratings for things they watched despite the fact that doing so is largely pointless since nothing really matters and were all going to die eventually. I tried signing up for MyAnimeListhttp://myanimelist.net/ once the signup process somehow went wrong leaving me in a quantum superposition wherein I was both registered and unregistered at the same time unable to log in yet equally unable to create a new account. Emails sent to their support received no response so I gave up and went back to AnimePlanet. This time I decided to try AniListhttp://anilist.co/ instead. Thanks to my years of training I was able to conquer the registration page in less than six decades a definite improvement. Now I just needed a way to transfer the lists from my AnimePlanet account over to my newlyminted AniList profile. Step 1: Assuming that we live in a universe in which convenience exists. AniList has a list importhttp://anilist.co/settings/import feature. This is a good and noble thing. However AnimePlanet does not have a corresponding export feature. I stared out past the rivulets of raindrops streaming down the window why I pondered do we live in a world where such injustice is not just tolerated but commonplace? Maybe theres a public API? AniList has onehttps://github.com/joshstar/AniListAPIDocs. I could easily convert JSON data into the right format if I coulNOPE there isnt IDEA ABANDONED. Since a simple exportandimport approach was clearly off the cards I went off in search of others who had faced similar circumstances in hopes of finding an alternative solution. Step B: Running random code you found on the internet is always a good idea. Before continuing I should probably explain how the import/export process works in more detail. According to The Internet MyAnimeList allows users to import or export lists in a custom XMLbased format for backup purposes. Other sites such as AnimePlanet and AniList can import these XML files allowing users to easily transfer their details elsewhere without needing to reenter everything manually. If I could find a way to convert AnimePlanet lists into MALstyle XML I could import them in the same way. But is such a thing even possible? Yes it is. At some point aliens someone wrote a Python2 script and posted it on pastebinhttp://pastebin.com/CMcUiuTR. I downloaded it and ran it in a terminal window: This script will export your animeplanet.com anime list and saves it to animeplanet.xml Enter your username: GoBusto Traceback most recent call last: File ./animeplanet.comxmlanimeexporter.py line 13 in html = urllib2.urlopenbaseURL.read File /usr/lib/python2.7/urllib2.py line 154 in urlopen return opener.openurl data timeout File /usr/lib/python2.7/urllib2.py line 435 in open response = methreq response File /usr/lib/python2.7/urllib2.py line 548 in httpresponse http request response code msg hdrs File /usr/lib/python2.7/urllib2.py line 473 in error return self.callchainargs File /usr/lib/python2.7/urllib2.py line 407 in callchain result = funcargs File /usr/lib/python2.7/urllib2.py line 556 in httperrordefault raise HTTPErrorreq.getfullurl code msg hdrs fp urllib2.HTTPError: HTTP Error 403: Forbidden Oh. It seems that this script doesnt work any more. Aha Theres an updated Python3 versionhttp://pastebin.com/ZRRNXsER lets try that instea This script will export your animeplanet.com anime list and saves it to animeplanet.xml Enter your username: GoBusto Traceback most recent call last: File ./animeplanettomalexporter.py line 17 in html = urllib.request.urlopenbaseURL.read File /usr/lib/python3.5/urllib/request.py line 163 in urlopen return opener.openurl data timeout File /usr/lib/python3.5/urllib/request.py line 472 in open response = methreq response File /usr/lib/python3.5/urllib/request.py line 582 in httpresponse http request response code msg hdrs File /usr/lib/python3.5/urllib/request.py line 510 in error return self.callchainargs File /usr/lib/python3.5/urllib/request.py line 444 in callchain result = funcargs File /usr/lib/python3.5/urllib/request.py line 590 in httperrordefault raise HTTPErrorreq.fullurl code msg hdrs fp urllib.error.HTTPError: HTTP Error 403: Forbidden okay.jpg Why werent either of these scripts working? Was there some common factor which caused both scripts to fail in the same way? Then it hit me: They were both written in Python. In light of this undeniable evidence that it was very clearly Python causing the failures I decided to adapt one of my old Ruby scripts to do the job instead. Step Green: Expecting triedandtested code to continue working in the future. Several years ago I wrote an IRC bothttps://gitlab.com/gobusto/rbningyo in Ruby as a way of getting familiar with the language. One of the things it could do via a plugin script was search AnimePlanet for information about anime or manga when asked to do so by another IRC user. This was my starting point. Before attempting to modify the script I first wanted to ensure that it still worked properly: irbmain:001:0 require ./AnimePlanet.rb = true irbmain:002:0 AnimePlanet.new.run anime Rozen Maiden = Sorry I couldnt find any information about Rozen Maiden Spoiler alert: It didnt. My guess is that the CloudFlarehttps://www.cloudflare.com/ DDoS protection used by AnimePlanet somehow prevents simple scripts from connecting to it and downloading the HTML content making any attempts to automatically generate an XML file based on the contents of the page impossible. Step Blarple: Nothing is impossible dont let your dreams be dreams. At this point I gave up I started the long boring process of manually copying everything from one browser window into another. I continued for THREE HOURS until I realised that it was 11pm and I had to wake up at 7am for work. Since I need at least 26 hours of sleep per day I reluctantly turned off my computer and went to bed angry at the world and all of the things in it but mostly XML files which could not be autogenerated due to DDoS protection services getting in the way of my valiant attempts to avoid unpaid data entry work. Just as I was drifting off a little voice in the back of my head piped up: Why dont you use Seleniumhttps://en.wikipedia.org/wiki/WebDriver? For those of you who have never heard of Selenium heres a brief introduction: Once upon a time some dude made a thing called Selenium Remote Control so that he could automatically find errors on the websites he created. This was useful since it meant that instead of having to manually look for coding errors on his website he could make coding errors in the scripts used to control Selenium RC instead thus shifting the blame elsewhere and achieving what is known by us in the industry as Progress. This AutoIthttps://en.wikipedia.org/wiki/Autoitbutjustforwebbrowsers was later superseded by Selenium Webdriver which properly integrated with Firefox Chrome etc. and allowed them to be controlled by a computer program pretending to be a human being also known as John Carmack. Thus websites dont see any difference between a browser controlled by Selenium and a browser used by an actual human made of flesh and capillaries and such. The next day I came home from work and spent the evening writing a small Ruby script to control Firefox via Selenium Webdriver. The result: A program which asks for a username and then autopilots a browser through each page of their anime and manga lists recording the things it finds in JSON format. Success The script can be found here: The next thing to do would be to either use the JSON data to update AniList via the AniList API or to simply have the the Selenium script dump the data in XML format ready for import via the AniList user interface. A standard anime list format is an idea too beautiful for this sinful world. Wouldnt it be nice if there were some standard file format that all anime/manga list sites used to import/export data? We would no longer live in a world where screenscraping your own profile page via a semiautonomous robobrowser servant is necessary to download your own history of cartoonbased amusement. It would also be nice to live in a world where free kittens were allocated to every citizen and you could eat pizza for every meal without doctors everywhere crying blood due to your poor but undeniably delicious life choices. What Im saying is that such a wonderful dream can never be due to the various factors that our harsh unforgiving reality imposes on each and every one of us: Everything has like a jillion different names. How many ways can this namehttp://anilist.co/anime/21359/OoyasanwaShishunki be given? Let us count together: + Ooyasan wa Shishunki + Ooyasan ha Shishunki + Ooyasan wa Shishunki + Ooyasan ha Shishunki + Ouyasan wa Shishunki + Ouyasan ha Shishunki + Ouyasan wa Shishunki + Ouyasan ha Shishunki + Oyasan wa Shishunki + Oyasan ha Shishunki + Oyasan wa Shishunki + Oyasan ha Shishunki + yasan wa Shishunki + yasan ha Shishunki + yasan wa Shishunki + yasan ha Shishunki + + What about thishttp://anilist.co/anime/6956/WORKING? + Sometimes its called Working + Other times its called Wagnaria + WHICH IS IT OBAMA? I can guarantee that whichever one you pick some other site will pick another option. Unless youre willing to allow every permutation of every possible name for every possible anime as part of your standard format things are going to be somewhat complicated. Every site has different rating systems. In the case of AniList even a single site has multiple rating systems: + ?/10 + ?/10.0 + ?/100 + ?/5 stars + : : and : How does the :iness of a rating on AniList translate to a percentagebased system or even one where GRAPHICS/GAMEPLAY/SKUB are given as separate scores? Only smarties have the answerhttps://en.wikipedia.org/wiki/Smarties. For manga some sites use volumes/chapters some use chapters only. AnimePlanet allows either volumes or chapters to be used. AniList only allows chapters. How many chapters are there in a volume? lol idk my bff jill? OVA episodes are they a series or are they independent? AnimePlanet categorises the various Angel Beats animations as follows: + Angel Beats 13 eps + Angel Beats Another Epilogue + Angel Beats Hells Kitchen + Angel Beats Special AniList however categorises them like this: + Angel Beats 13 eps + Angel Beats: Another Epilogue + Angel Beats Specials 2 eps Which is correct? Neither both options are equally valid Obviously AnimePlanet is wrong my loyalty to AniList is absolute and unwavering all hail AniList. tl dr + AnimePlanet has no export button. + Doing things the right way doesnt work. + Selenium: Not The Worst Solution + Standards = Never.
12 Replies