Archive for September, 2004

Genlop

Thursday, September 30th, 2004

I’ve just emerged genlop, after reading about it in a tutorial about subversion. Genlop parses portage logfiles for information, as it is said in it’s man page.

Explicitily copying it’s man page :

- Nice colorful output.
- Full Portage merge and unmerge history.
- Display date, time and build time of every merge.
- Display total and average build time of selected package[s].
- Estimate upgrade time.
- Watching current merge progress.
- Use alternate portage logfile(s).
- Match package names using regular expressions.
- Log corruption detection.

The most useful function I found is being able to know how much time did an emerge take:
The syntax is easy:

root@apple esteve [-1m] # genlop xorg-x11 -t
* x11-base/xorg-x11

Wed Jul 28 18:46:18 2004 –> x11-base/xorg-x11-6.7.0-r1
merge time: 1 hour, 11 minutes, and 57 seconds.

Tue Sep 21 19:52:37 2004 –> x11-base/xorg-x11-6.8.0-r1
merge time: 45 minutes and 58 seconds.

merged totally 2 ebuilds in 1 hour, 57 minutes, and 55 seconds.
average merge time: 58 minutes and 57 seconds.

Update:
As Pau noted in a comentary, another useful option is -i :

-i extra infos for the selected package (build specific USE and CFLAGS variables, average build time, etc)

LatexRender as plug-in for WordPress

Thursday, September 23rd, 2004

I’ve just insalled latexrender and the plugin for wordpress. LatexRender let’s you insert latex formula without any trouble, as dependencies it requires Tetex, ImageMagick and Ghostscript.

Some Tests:

This comes from the latexrender homepage :
[tex]\displaystyle\int_{0}^{1}\frac{x^{4}\left(1-x\right)^{4}}{1+x^{2}}dx=
\frac{22}{7}-\pi+\frac{\gamma}{\delta}[/tex]

From a practice at University:
[tex]
A(R,p) = \gamma(R,p) L_{ef} = k\;R^{\alpha}\;L_{ef}
[/tex]

These ones are from LatexRender examples :
[tex]
\begin{displaymath}
\sum_{\substack{0 P(i,j) =
\sum_{\begin{subarray}{l} i\in I\
1 \end{subarray}} Q(i,j)
\end{displaymath}
[/tex]


[tex]
\begin{displaymath}
\mathbf{X} =
\left( \begin{array}{ccc}
x_{11} & x_{12} & \ldots \
x_{21} & x_{22} & \ldots \
\vdots & \vdots & \ddots
\end{array} \right)
\end{displaymath}
[/tex]


[tex]
\setlength{\unitlength}{1mm}
\begin{picture}(60,40)
\put(30,20){\vector(1,0){30}}
\put(30,20){\vector(4,1){20}}
\put(30,20){\vector(3,1){25}}
\put(30,20){\vector(2,1){30}}
\put(30,20){\vector(1,2){10}}
\thicklines
\put(30,20){\vector(-4,1){30}}
\put(30,20){\vector(-1,4){5}}
\thinlines
\put(30,20){\vector(-1,-1){5}}
\put(30,20){\vector(-1,-4){5}}
\end{picture}
[/tex]

References:

Mathematics Weblog
WP Plugin Page
WP Latexrender plugin
Mimex Homepage

Indexing MP3 and PDF files with python

Tuesday, September 21st, 2004

Continuing with the indexing file system from last post, I’ve implemented parsing support for both MP3 metadata and PDF contents, it works in the same way. When you put a PDF or an MP3 file in the indexed directory, a parser gets the contents of the file and indexes them.

For the MP3 it indexes as words all info contained in the TAG and for PDF it converts them to text ( using pstotext ) and indexes every word. As I’ve commented in last post I’m not seeking performance, I know that there are better ways of indexing a PDF, but I just want to try how an indexed filesystem would look like

The next thing that needs to be done is indexing by document type, so being able to search by artist, song, pdf author and so on. There is also need to put more information on found words, like line on text where the word was found, following words, and so on.

Mini Bench:
The time for indexing a 1.3 Mb PDF file in my system is 55 seconds. After that searches for a word under MySql are reported in 0 seconds. This file contained 77241 words that where 7161 unique words.
From this little test it is obvious that the PDF parser is too slow for real use, but for testing is ok.

Code:
Code for the Indexer with MP3 metadata and PDF support

References:
Magic Python: Determines file type
Pstotext homepage

Searchable Filesystem with Fuse-Python

Monday, September 20th, 2004

FUSE is a library with a set of function that let you reimplement the VFS layer operations, in other words it lets you write a userspace filesystem, not doing so in kernel space is really flexible, as we can use any external library.

I first became aware of Fuse when I read about GmailFs, which uses a Gmail account as a structured storage, thus becoming a normal mountpoint in your system. GmailFs uses Fuse-Python which are a set of bindings to use Fuse under python. AVFS is another example of Fuse filesystem, it uses tar files as if they were drives, so they can be mounted and accessed without having to untar them.

What I want to use Fuse for, is to make the file indexer more autonomous, with the help of Fuse one might be able to have a fuse mountpoint where all files are always indexed. Once you perform a write operation the indexer might be able to re index the changed portion, when you delete a file it’ll be automatically out of the indexer, and so on. This way there is no need to have the indexer as a daemon which checks for changed files to reindex.

To accomplish this I need to implement an indexing hook on the following operations:

Write
Unlink
Rename
Trunacte

At the end we would have a regular directory, but always synced with the indexing database.

Implementation:
The main goal is just to make a prototype of how would a searchable filesystem look like, is for this reason I have chosen Python ( and also because I’m in process of learning it ). The implementation will make as use of an existing filesystem ( aka reiser, ext, …). Fuse won’t have much to do, just hook all access from one directory to another.
Imagine we want to have an indexed directory called Documents, we would have a mirror in .Documents, where the files are stored using our normal filesystem. We would work normally under Documents, when doing a write operation the hook would catch it and reindex the file, after that the file is saved in the mirror directory. The mirror directory makes our implementation idependent from the filesystem that the user is more comfortable with, we also avoid any problem with data corruption because all the data is stored in a reliable filesystem, we are adding a layer on top of our filesystem, not reinventing the wheel.

Code:
Code for the Indexing Filesystem

References
Fuse SF project
LWN Article on Fuse
Fuse Documentation
GmailFs

Simple Layout

Wednesday, September 15th, 2004

After having much trouble with the previous layout, I decided to stay away from images and try to keep things as simple as possible, this way I won’t have compatibility problems with different browsers. Now it should work well with any decent browser.

If there is any problem please leave a comment. Thanks