This week I came across to an interesting problem. While I was parsing a big file using sed I came across with an error "Argument list too long". I knew that error and sed was right I was doing my job on a big file with long lines. Also I had unnecessary loops in my code which I wanted to eliminate, maybe with a hash like structure.
So I started looking for a solution. My friend offered me Perl's Tie::File module, but I was too lazy to install needed RPM's on my machine; also I needed hash like structure(Tie::File::AsHash maybe solves that problem, whatever).
And then I came across with Python's shelve module. That was my solution. Doing parsing using Python's dictionary without memory concerns because everything is kept in a DB like file, great!
Basically what shelve module does is you use dictionary like object in your code and that dictionary is kept on the file-system, not memory.
Enough talk let's show some code. As an example I will implement a basic version of GNU join command to illustrate how shelve works.
Assume we have two files.
First one has COURSE_NAME | STUDENT_NAME1, STUDENT_NAME2
Second one has STUDENT_NAME | COURSE_STUDENT_TAKES
Here is our code
Example input/output will be like below.
Happy hacking.
Edit 1:
It's really slow with big data, be careful :) In such case (IMHO) use a non-rel db, mongoDB is good.
Installation elasticsearch, haystack, django
We are using:
Installation is straightforward:
Now let's create our Django project:
Now we should edit 'settings.py' in order to use elasticsearch in Django:
And we are done with installation part.
- Centos 5
- Django 1.4.3
- Haystack 2.0.0-beta
- Elasticsearch 0.20.1
Installation is straightforward:
Now let's create our Django project:
Now we should edit 'settings.py' in order to use elasticsearch in Django:
And we are done with installation part.
Senior Project
As our senior project we are building a real-time log viewing and filtering utility. We will be using tools like Django, elasticsearch. From now on I will try to write what we did and how we did it with these tools. And I think in June 2013(maybe earlier) I will be able to put our project on github.
That's all for now.
Subscribe to:
Posts (Atom)