Here at ChiTownBio, we encourage exploration and the DIY mindset in more areas than just biology. We love to meet people who are avid makers and programmers as well! In fact, programming is a super useful skill for DIY biology in both analyzing data and designing low-cost hardware, so I’ve been teaching myself some new tricks, and I thought I’d show you all what I’ve been working on.
“Function Using A Real-Ten Value of Multiviral Hepatocellametric Active Transgenic Chemotherapy”
“Dependent Interfering Variants to Malaria for Absendribility on Diagnosis and Miral Protein Regulation”
Do those sound like gibberish to you? Congratulations! They are! Those titles were generated by a neural network trained on all of the science articles ever published by the journal PLOS or its affiliates. PLOS is an open-access journal, which means you never have to pay to read their articles, which is great for community labs and citizen scientists. In the open science spirit, they also have made available the entire PLOS corpus in a downloadable format for text and data mining and analysis, which you can find here.
I find programming to be a creative pursuit–no two people will come up with exactly the same approach to a question. Also in the spirit of IAmSciArt, I’ll pose the question, is this Twitter account performance art? What does this say about the accessibility of scientific language, if a computer can generate fake jargon almost indiscernible from the real thing to a layperson?
I’m working on putting all the code for this project on GitHub, if you want to check it out (it’s a work in progress). I used the machine learning code from Sean Robertson (Github here) and cleaned up all the PLOS titles myself by iterating through 200,000 XML files hahahahaha that was fun (joking). Then I made a Twitter bot (well, not exactly a bot since I can’t figure out how to update it automatically) that tweets a generated title. Enjoy!
-Jordan