R Language I/O for database tables


Specialized packages

Reading Data from MySQL Databases


Using the package RMySQL we can easily query MySQL as well as MariaDB databases and store the result in an R dataframe:


mydb <- dbConnect(MySQL(), user='user', password='password', dbname='dbname',host='')

queryString <- "SELECT * FROM table1 t1 JOIN table2 t2 on t1.id=t2.id"
query <- dbSendQuery(mydb, queryString)
data <- fetch(query, n=-1) # n=-1 to return all results

Using limits

It is also possible to define a limit, e.g. getting only the first 100,000 rows. In order to do so, just change the SQL query regarding the desired limit. The mentioned package will consider these options. Example:

queryString <- "SELECT * FROM table1 limit 100000"

Reading Data from MongoDB Databases

In order to load data from a MongoDB database into an R dataframe, use the library MongoLite:

# Use MongoLite library:
# Connect to the database and the desired collection as root:
db <- mongo(collection = "Tweets", db = "TweetCollector", url = "mongodb://USERNAME:[email protected]HOSTNAME")

# Read the desired documents i.e. Tweets inside one dataframe:
documents <- db$find(limit = 100000, skip = 0, fields = '{ "_id" : false, "Text" : true }')

The code connects to the server HOSTNAME as USERNAME with PASSWORD, tries to open the database TweetCollector and read the collection Tweets. The query tries to read the field i.e. column Text.

The results is a dataframe with columns as the yielded data set. In case of this example, the dataframe contains the column Text, e.g. documents$Text.