Deniz Eren An ordinary person

Writing Tests for Dynamodb Application

While developing my side project I needed to write tests for my backend which was using Dynamodb. So connecting to Amazon in my test code was not the solution (obviously), but what was the right way to write tests? I looked at goamz library, which I used for dynamodb interaction in my project.

DYNAMODB_LOCAL_VERSION = 2014-10-07

launch: DynamoDBLocal.jar
    cd dynamodb_local_$(DYNAMODB_LOCAL_VERSION) && java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar

DynamoDBLocal.jar: dynamodb_local_$(DYNAMODB_LOCAL_VERSION).tar.gz
    mkdir -p dynamodb_local_$(DYNAMODB_LOCAL_VERSION)
    [ -f dynamodb_local_$(DYNAMODB_LOCAL_VERSION)/DynamoDBLocal.jar ] || tar -C dynamodb_local_$(DYNAMODB_LOCAL_VERSION) -zxf dynamodb_local_$(DYNAMODB_LOCAL_VERSION).tar.gz

dynamodb_local_$(DYNAMODB_LOCAL_VERSION).tar.gz:
    curl -O https://s3-us-west-2.amazonaws.com/dynamodb-local/dynamodb_local_$(DYNAMODB_LOCAL_VERSION).tar.gz

clean:
    rm -rf dynamodb_local_$(DYNAMODB_LOCAL_VERSION)*

Voila! This was the trick. Guys at Amazon build a small client-side version of dynamodb using Java so that people like me can test their applications without connecting to Amazon.

Using this local dynamodb application I was able to write a small dynamodb backend for gorilla sessions and write tests for my side project.

Happy hacking.

Flask file descriptor inheritance problem

Last month my friend was writing a service monitoring and controlling app using flask. The app was simply checking status of service in the server and also start, stop and restart functions were available to users in this app.

In about a week my friend came across with an expected bug, file descriptor inheritance. He was complaining about port already in use errors. When I checked who was listening to that port using netstat it was interesting, the services managed by the app were listening to app's port.

crazy-server# netstat -nlp|grep 8000
tcp        0      0 0.0.0.0:8000           0.0.0.0:*               LISTEN      1958/mysql

The problem was obviously file descriptor inheritance. after service monitor app starts or restarts a service its open file descriptors were inherited by the new started processes.

Solution was setting CLOEXEC flag using fcntl. To do that we first had to get all open file descriptors since flask was only creating sockets we needed to find only socket objects. Using python's garbage collector we fetched all socket objects.

filter(lambda x: type(x) == socket._socketobject, gc.get_objects())

After that we set CLOEXEC flag on all socket file descriptors.

for sock in filter(lambda x: type(x) == socket._socketobject, gc.get_objects()):
    fd = sock.fileno()
    old_flags = fcntl.fcntl(fd, fcntl.F_GETFD)
    fcntl.fcntl(fd, fcntl.F_SETFD, old_flags | fcntl.FD_CLOEXEC)

And our fd inheritance bug was solved.

Happy hacking.

Large file parsing with python

This week I came across to an interesting problem. While I was parsing a big file using sed I came across with an error "Argument list too long". I knew that error and sed was right I was doing my job on a big file with long lines. Also I had unnecessary loops in my code which I wanted to eliminate, maybe with a hash like structure.

So I started looking for a solution. My friend offered me Perl’s Tie::File module, but I was too lazy to install needed RPM’s on my machine; also I needed hash like structure(Tie::File::AsHash maybe solves that problem, whatever).

And then I came across with Python’s shelve module. That was my solution. Doing parsing using Python’s dictionary without memory concerns because everything is kept in a DB like file, great!

Basically what shelve module does is you use dictionary like object in your code and that dictionary is kept on the file-system, not memory.

Enough talk let’s show some code. As an example I will implement a basic version of GNU join command to illustrate how shelve works.

Assume we have two files.

First one has COURSENAME | STUDENTNAME1, STUDENTNAME2 Second one has STUDENTNAME | COURSESTUDENTTAKES

Here is our code

import shelve
 
# Files to merged

FILE1 = '/tmp/file1'
FILE2 = '/tmp/file2'
 
# Shelve file

SHELVE = '/tmp/shelve'
 
# Open shelve file

my_dict = shelve.open(SHELVE)
 
# Open first file and create the dict

with open(FILE1, 'r') as f1:
    for line in f1:
        s_line = line.strip().split('|')
 
        # Append file’s elements to dict

        my_dict[s_line[0]] = [s_line[1]]
 
# Open second file and create the dict

with open(FILE2, 'r') as f2:
    for line in f2:
        s_line = line.strip().split('|')
 
        # Append file’s elements to dict’s list

        my_dict[s_line[1]] = my_dict[s_line[1]] + [s_line[0]]
 
# Print out the dictionary

for key in my_dict.keys():
    print key + '|' + ','.join(my_dict[key])
 
# Close shelve file

my_dict.close()

Example input/output will be like below.

======== File1 ========
math|osman
phys|ayse,fatma
 
======== File2 ========
hasan|phys
huseyin|math
 
======== Output ========
math|osman,huseyin
phys|ayse,fatma,hasan

Happy hacking.

Edit 1: It’s really slow with big data, be careful.