Imagining undiscovered species with neural networks

TLDR: Neural networks are cool. I trained a recurrent neural network on a list of around 25000 species names and made it generate its own, then built a Twitter bot that tweets one out every hour. The results are kinda funny.

I’ve always been interested in the application of artificial intelligence techniques to ecology. There’s huge potential and some very low hanging fruit in the use of machine learning to make predictions about species distribution and abundance, as well as a whole list of other things like evolutionary processes and taxonomic classification. Generative neural networks can sometimes also produce some surprising and funny results. I see it as a form of uncanny valley, where the results produced are similar enough to be plausible, but strange enough to cause a moment of cognitive disconnect. Koans in AI generated text. These texts give insight into our own lingual and syntactic abilities. What is normal language and what are the rules by which we produce and recognise it? Why is it so funny when those rules are broken?

I was recently inspired and amused after coming across an article on Janelle Shane’s blog, in which she lists recipe names generated by a recurrent neural network trained on a corpus of about 30 000 real recipe titles. Janelle has also turned her neural network to several other text sources, including Irish folk songs, knock knock jokes, Pokemon names, and full recipes themselves, all with hilarious results. I decided that I would take up the torch (you’ll get the pun in a minute) and have a go at producing a neural network capable of generating plausible species names.

Firstly, I needed to gather a training dataset. In order to broaden the appeal and increase the comedic value of the output, I decided producing common names for each species was important, so that meant I needed a dataset of species that had common names. This was a little hard to find. In the end I cobbled together lists of around 12600 animals and 14700 plants from various online datasources including the Atlas of Living Australia and the Global Biodiversity Information Facility. For each species I included the family name, the species binomial, and a list of common names. I kept the plants and the animals separate. The animals dataset was quite heavy in marine creatures, which is visible in the output – the network generates a lot of eels, fish, and crabs.

So how’d I actually do it? And what is ‘training’ a neural network? There’s a very good explanation of how recurrent neural networks work here, although that may be a little technical for most. The simple explanation (and this is about the extent to which I understand it properly, so feel free to update or add to my knowledge in the comments) is this: The neural network looks at each character in the source text at a time, and it makes a guess about what character will come next based on all the previous characters that it has read. Then it checks that next character, and updates its model according to whether it guessed correctly or not. The network doesn’t know anything about english; it doesn’t know anything about the subject matter; it just sees each character as a vector within a probabilistic space and it builds a model around those probabilities. How do you actually train it? That part is pretty simple, thanks to the great tools that have been built in this space over the last few years.

I used torch-rnn, a recurrent neural network package for the Torch (there’s the pun!) scientific computing framework. I followed the excellent guide here, and while I had to dive into Github issues a couple of times to solve installation glitches and even had to modify the source code to run on my machine, I got it up and going within an hour or so. Training took a while on my GPU-less MacBook Air – around 12 hours each for the animal and plant datasets. At the end of that process I issued commands that asked the neural network to generate sets of species names based on the animal and plant models it had developed. The output of these went into a text file ready for my Twitter bot to tweet. I generated enough to keep the bot going for around a year at one tweet per hour.

The results don’t quite have the comedic value of Janelle Shane’s recipe titles, but biologists might find them amusing, and it is really interesting how the AI has learned many of the rules of species naming – that plant families should end with -aceae, and animals with -dae, and that species names should have a ‘latin’ feel to them. In many cases it even used real family names, and sometimes genus names, I guess because there are few enough of them that it could learn that the whole word was commonly used. It learned that species names should be in two parts, and that common names often include hyphens, possessives, and terms like ‘weed’, or might end with ‘fish’ or ‘rose’. Of course, there are many times it gets those things wrong too – sometimes producing a family-like name in place of a species name, which resulted in the bot tweeting a species-like name in place of the common names, or else just combining things in some unrecognisable fashion.

The bot itself is pretty standard; using the Tweepy library really makes it easy to set up a Twitter bot. It runs on my Raspberry Pi, and is triggered by a cron job every hour.

Follow Undiscovered Species on Twitter to keep up with the names.

Pilbara Adventures

Definitely long overdue for an update. While I have a few things to catch up with from late last year, I’ve recently spent some time botanising in the Pilbara; near Newman and Port Hedland, and I’ll post about the Newman work now.

I spent a couple of weeks working near Newman, spread over two periods in November last year and March this year. Both jobs were flora surveys for mining proposals, and both jobs involved the use of a helicopter. I was quite excited about this at first, however the fun soon wore off. The helicopter was small, cramped, difficult to get in and out of, hot, noisy, and imparted a sense of hurry to the work which I didn’t enjoy, due to the fact that the pilot would often leave the engine and rotors running while waiting for me to finish off a site.

helicopter at sunrise

Another day begins

It was a great way to experience the landscape though, and to be able to see changes in vegetation types from above made the work easier. Every landscape has its optimal viewing distance for maximum visual beauty; and the stony hills of the Pilbara are best suited to being seen from about 100m up.

Stony hillsides with creeklines are common land forms in this part of the Pilbara

Being in the helicopter also enabled me to get an interesting (and sometimes slightly scary) angle on the weather. On most days our work was cut short by the development of thunderstorm cells, and several times we dodged around active thundersorms on our afternoon commute back to our camp.

Thunderstorms from the air

The weather was also interesting to photograph from the ground. This is the sky after a storm passed over us at Sylvania Station, where we camped:

While flying in the helicopter, it was interesting to note the number and extent of mining activities in this part of the country, which was mind boggling. Our daily commute took about an hour, and in this time we would pass over three active mines and several areas of development that appeared to be planned mines, as well as various camps and other infrastructure such as railways and roads.  It seemed that there was some kind of construction project happening in every valley. If there was no major developments, then there would at least be drill pads on the ridges and sometimes the flats.

Mining landscape

The area north of Newman is known for its stony hills, gorges (exemplified by those at Karijini National Park), and open plains dissected by sandy creeks. The vegetation is ‘spinifex’ Triodia grasses with Acacia shrubs and/or sparse small Eucalyptus trees. Members of the Malvaceae (mallow) family are common shrubs; as are peas.

 Typical Pilbara landscape

View from above. Iron plant (Astrotricha hamptonii) in foreground.

The pretty Cleome oxalidea occurred on clay flats in the area

I also encountered some mushrooms, both unusual for this area from my experience. An Amanita, found emerging from a rocky hillside, with an unusual double annulus:

And a Coprinus, or related genus, found in a dense creekline:

And a final landscape:

Spring Wildflowers – Part 1

It’s still early spring, but the orchid season is in full swing in the South West. Although I haven’t been able to explore far afield yet, a few short walks have yielded an amazing number of species. I counted eight species of orchids in a short walk around Wireless Hill park in Applecross last week – most of the below photos are from this location. This may be due to the ‘near average’ rainfall – the best that Perth has had in ten years (BoM Winter Rainfall Summary).

View Larger Map

Wireless Hill is itself an interesting place and has plenty of history. It was known as ‘Yagan’s lookout’, Yagan being the well known indigenous leader and freedom fighter in the early days of the Swan River Colony, whose story is one of the foundational parables of Noongar/White relations in Western Australia. The hill gets an excellent view both west towards Fremantle and north-east towards Perth and was obviously of some strategic importance for that reason.

There is currently a very active Friends Group who can be observed (or assisted) in looking after the bush. There are significant ecological problems as there usually are in small urban remnant bushlands; especially weed invasion, and frequent fire. However, the Friends group is actively removing the weeds, and in some areas, I was greatly impressed by the health of the native vegetation, no doubt due to their hard work. They host a list (Microsoft Word .doc file) of the flora that have been recorded there.

This beautiful group of Caladenia discoidea, the Dancing Orchid, was found at Wireless Hill. I was excited to see these as I had never seen this species before.

Another shot of Caladenia discoidea flowers. This species is characterised by its short petals and flattened, disc-like labellum. There are often stripes on the petals, and the petal colour is variable, ranging between yellow, white, and pinkish.

A Jug Orchid, Pterostylis recurva.

Caladenia arenicola
The Carousel Spider Orchid, Caladenia arenicola. The specific epithet means ‘from sand’, indicating this species grows on sandplain country. Wireless Hill is also host to several other spider orchids; I saw Caladenia longicauda on the day I visited, and the very rare and very large Grand Spider Orchid, Caladenia huegelii has also been recorded there.

Although maybe not as exotic and fascinating as the orchids, Anigozanthus manglesii is quite spectacular and is the flora emblem of our state. I have rarely seen them as healthy and as numerous (except in horticulture) as I saw them recently at Wireless Hill.

Pink fairy orchid
The Pink Fairy Orchid, Caladenia reptans subsp. reptans, found at a farm in the Capercup area of the south west, between Collie and Kojonup, growing in Wandoo woodland habitat.

I’ll follow up this post with further updates as the wildflower season progresses; it’s going to be a good one. My next trip is out to the goldfields, north of Southern Cross, so I hope to capture some of the beauty of the dry country.

Hunting fungi in the South West – Part 1

A post from the present: winter is here and so far, southwest WA has had a small but significant amount of rainfall. That means fungi! Now that the season has started, I’ll be posting photos of my mushroom finds throughout the winter. Today’s post is images from the area of Nannup, where I stayed with some friends and took some time to explore.

Trametes versicolor on a log
Trametes versicolor, Turkey Tail mushroom, growing on a log near Nannup, WA.

One of my current main interests is the medicinal mushrooms of the polyporaceae, including Trametes versicolor and Ganoderma species. There is significant evidence that these mushrooms can be used to treat some types of cancer, some viruses, and a range of physiological diseases. Trametes, called the Turkey Tail mushroom because of it’s concentric rings of colour, is also interesting biologically and ecologically. It occurs in many countries around the world and grows on many different types of wood, and has even been found to be able to decompose trinitrotoluene (TNT), the explosive in dynamite. It is highly variable in colour, as you can see by the following photo, which shows much paler specimens. Other mushrooms of this species that I have seen have quite striking concentric rings of brown, grey, and white, which accounts for its common name.

Trametes versicolor mushrooms and some leaves
More Trametes versicolor Turkey Tail mushrooms. Near Nannup, WA.

Another fungus that I saw the jarrah forest was this Gymnopilus species. I suspect this one is G. pupuratus, also known as Laughing Gym, a species that is also reputed to have ethnopharmacological potential as an hallucinogen. It does not appear to have entered the Western Australian psychoactive mushroom seekers’ culture to the extent that a certain other taxon has, probably due to the fact that it is from a taxonomic group that is not well-known globally for its psychoactivity. It may also be weaker or more variable in potency, and hence less reliable as a drug, but the common name speaks volumes. There are very few internet reports of its use. I admired its perfect form and fleshy orange skin without the temptation to perform any pharmacological analysis using my own neurology. I have found them growing in damp areas on Banksia and Melaleuca logs many times before. This one was growing from an old jarrah log.

EDIT: I have since brushed up on my Gymnopilus identification skills, and I do not believe this one to be G. purpuratus – however I don’t know what it is! Any suggestions welcome. I am still learning about these fungi.

Gymnopilus mushrooms growing on a log
Gymnopilus sp., possibly G. purpuratus. Near Nannup, WA.

There are several large areas of pine plantation near Nannup, and I went for a wander to see what I could find amongst the leaf litter. The most common were Slippery Jacks, Suillus luteus, a large mycorrhizal mushroom in the bolete family. These are reputedly edible, and I have eaten them; but I will continue to refer to them as ‘reputedly edible’. To make them palatable, you are advised to remove the tough skin from the cap, and also the pores underneath the cap, leaving only a small wad of mushroom flesh. This may then cause mild to severe gastrointestinal upset, although some people continue to claim they eat them without a problem. I ate them in Ecuador, picked from a pine forest near Vilcabamba. I found the taste acceptable, however the sliminess was a bit much for me, and the meal left me with a vague nausea. I won’t eat them again unless I have to.

Also amongst the pines in Nannup I found these tiny mushrooms which are probably in the genus Mycena. Due to a few dry days, the caps had started to shrivel, but they were still pretty cute.

Macro photo of tiny mushrooms amongst pine needles
Some tiny mushrooms of an unidentified species growing under the pines

After a day of fungal foraging, I was rewarded by this spectacular sunset through the trees. The light was like honey, trickling through the smoke from a fire on a neighbouring property.

A beautiful sunset behind trees
Red sunset

I’ll be adding further posts over the next month or two with my winter adventures. Enjoy.