The GDELT Dataset
For this hackathon you will be working with the GDELT Project Dataset . The GDELT ( Global Database of Events, Language, and Tone ) Project monitors the world’s broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day.
How To Work with GDELT?
Over the next few weeks we’re going to be publishing blog posts, hosting live streams and AMA (ask me anything) sessions to help you with your GDELT and MongoDB journey. In the meantime, you have a couple of options: You can work with our existing GDELT data cluster (containing the entirety of last year’s GDELT data), or you can load a subset of the GDELT data into your own cluster.
Work With Our Hosted GDELT Cluster
We currently host the past year’s GDELT data in a cluster called GDELT2. Once you have an Atlas account set-up, you can access it read-only using Compass, or any of the MongoDB drivers, with the following connection string:
mongodb+srv://readonly:readonly@gdelt2.rgl39.mongodb.net/GDELT?retryWrites=true&w=majority
The raw data is contained in a collection called “eventsCSV”, and a slightly massaged copy of the data (with Actors and Actions broken down into subdocuments) is contained in a collection called “recentEvents”.
We’re still making changes to this cluster, and plan to load more data in as time goes on (as well as keeping up-to-date with the 15-minute updates to GDELT!), so keep an eye out for the updates!
How to Get GDELT into Your Own MongoDB Cluster
There’s a high likelihood that you can’t work with the data in its raw form. For one reason or another you need the data in a different format, or filtered in some way to work with it efficiently. In that case, I highly recommend you follow Adrienne’s advice in her GDELT Primer README.
In the next few days we’ll be publishing a tool to efficiently load the data you want into a MongoDB cluster - bear with us. In the meantime, read up on GDELT, have a look at the sample data, and find some teammates to build with!
Further Reading
The following documents contain most of the official documentation you’ll need for working with GDELT. We’ve summarized much of it here, but it’s always good to check the source, and you’ll need the CAMEO encoding listing!
Please reply below with any questions you may have regarding GDELT and we’ll endeavour to answer them as quickly as we can.