Wednesday, March 25, 2009

Google Alerts: watch the web growing

Google has very cool service, which is now that widely used as mail, or docs...
It is Google alerts. Basically, idea is very simple - you just get to their website and enter keywords you want to get notified about.

Once Google bot indexes page, which may be relevant to the search query you've provided - you get an e-mail message about that. Some kind of 'Google bot news' :)

You can tune the service to send you updates on each encounter of relevant page, or pack these pages to daily/weekly digest.

This things becomes very useful, when you watch some activity on the web - for example, your partner's website.

Google Alerts is an example of minimalistic interfaces (as for me), and that's why it has this tag granted :).

Tuesday, March 24, 2009

Different TRAC workflow for individual ticket types

TRAC is a tracking software, used to store and work with different issues during software development process - let it be bugs, development tasks, requirements, or feature requests.
It is free, flexible, has good SVN integration (as for me), and is quite minimalistic in its design.

And - yes it heavily uses wiki markup everywhere, making all linkable to all.
But - if you want to setup custom workflow, and you want it to be different for different ticket types, you'll hit a wall. This is currently not implemented in trac workflow.

The good news are that TRAC is python-based, and thus its sources can be changed easily.

Different workflows for different ticket types in TRAC (TRAC ticket link:) ) describes how one can solve the problem, with minor application of patch command. Have not tried it yet, but will report soon...

Denyhosts for Unix

I've had a Linux box standing on the Internet (uptime~=180days). It just had Ubuntu server installed, Tomcat and SSHd.
And I never looked at security logs...

Security logs (/var/log/auth.log) were full of invalid login attempt messages!
I apparently got there to see who was accessing the machine during this time, and found that log is full of invalid login attempts through ssh - hjackers seem to try to brootforce the box.

So, I just found Protect your ssh-servers, ban offending hosts article about protection - great thanks to authors. And in two minutes all the offending hosts were in hosts.deny file (denyhosts figured them out, and banned), and auth.log stopped to grow because of invalid attempts.

What I've learned from that.
So basic idea is to setup little tool, which blocks connections from some hosts, and automatically adds hosts to deny list if they have stepped through the threshold of invalid attempts.

And - never. Never open direct root access from the ssh. Ubuntu has deeper facility - AFAIK there's no possibility to perform direct interactive login as root (only sudo is ok), but nevertheless - close this down.

Monday, March 23, 2009

Google Web Toolkit (GWT) Search Engine Optimization (SEO) demo will be soon

I'm going to create a little demo showing how one can make ajax-loaded content indexable by search bots.

Basic idea is to create "secondary site" - static one - showing EXACTLY the same content as primary one. If bot gets to secondary site, it can index it well because of static nature.
If user gets to the specific page crawled by bot, it just gets redirected to the dynamic version of the site, showing the same content.

Here are some key tools I'm going to use:
  • OpenSymphony SiteMesh - good decorating filter. It would allow us to wrap the entire page into some kind of template. For example, we could put navigation header/footer for bots and old browsers and redirect statement for modern browsers, deciding on user-agent.
  • Some Java-based CMS for generating/accessing content. Not decided yet. Even better is to use Wiki engine.
  • Google Web Toolkit, surely.
  • Tomcat as a servlet/jsp container.
Right now it seems to me that I don't need anything more.
So, waiting for a demo. Hope I'll have several spare hours to get hands on this interesting stuff.

By the way, there's one interesting problem I can see now: If I put static content inside HTML panel, this would cause problems with navigation and links. For wiki, especially. Any link displayed would lead us out of the dynamic application, which is not desired.

Alternate solution is to put content to the IFRAME element. Don't think I really like it, but...

Friday, March 20, 2009

.NET 1.1 LDAP performance: System.DirectoryServices.DirectorySearcher lags!

This short posting describes performance problem with .Net 1.1 LDAP (in particular, System.DirectoryServices namespace). Problem was solved.

VB.NET 1.1 performance problem with System.DirectoryServies.DirectorySearcher
While working on one of my projects (VB/C# .Net 1.1), I've run into strange performance problems. Controls were loading slowly. After some research, I narrowed the problem - server made several requests to ActiveDirectory through LDAP protocol. Each of those requests took about 7-8 seconds to complete. Same program, written on Java, worked blazingly fast.
Honestly saying, I spent several days fixing this problem.

Problem is caused by DirectorySearcher implementation
Problem was actually caused by System.DirectoryServices library (really a COM wrapper) having some lags with memory allocation, or something like that. I'll add a link once find it in bookmarks.
So, each time request is performed, it at first spends 7 seconds for authentication, or something like that.

Possible solution (worked for me)
Honestly: I did not manage to make this correct. What I've done, is just a plain caching. So, I just saved link to System.DirectoryServices.DirectorySearcher in the static (even not instance, byt in Class variable!), and pinged it in some period.

Caching for DirectorySearcher gave expected results
Yes. First query was awfully long. But subsequent calls were re-using the initialized object, and thus were working blazingly fast! As there are no side effects yet - application is web application, its uptime is more than 2 months.

Useful links on DirectorySearcher performance problem

Wednesday, March 18, 2009

Afraid of being banned by google for 'cloacking?'

Cloacking, basically, is idea of presenting different content to normal users and to bots. This malicious technique is used by bad people to increase their ranking at Google, and get traffic, while presenting to user content, which was not really requested by user. And there are some good news in the end of the article :)

Example of Cloacking (do not do like this!)
Here is just a rough example. Imagine, there's a page on the web with url http://exampledomain.url
If Google crawler gets to this page, it is presented with one version of page - clean, structured, full of text, headings, etc. This version of the page has lots of keywords, say 'free software download'. It is ranked good by Google bot.
It is possible to understand, who is 'knocking on the door' - machine or real user.
So, if an internet user comes to this page, he sees a face of some PR company showing up.

This is considered to be cloacking, and site may be removed by Google personnel from Google indices once they discover such bad behavior.

Problems for good guys
Ok, this anti-cloacking technology makes search indexes more clear, readable and reliable. But what if you have fully-dynamical site? Yes, site which is fully powered by javascript, or flash?
If its navigation done with javascript - you have problems.
If your application is build with the Single Page Interface - there's no correct way to make it indexable. (this is how guys from Redmond understand SPI)

Good news (for good guys only)
"The only hard and fast rule is to show Googlebot the exact same thing as your users."
These is very good news (yes, for me it's news. I know that posting has been published @07). This means, that your site will not go banned, if you show the same content for same URLs, but in different way.
So, some links:
Ok. Gone for now - will post something on this topic soon...

SEO and GWT

I'm going to research a bit what's going on the gwt and seo world now...
If you understand you don't know any of those words, GWT stands for Google Web Toolkit - tooling for creating rich web applications, and SEO stands for Search Engine Optimization.

So, getting hands on it...
In this posting, just 2 links to the documents I'm going to work on at first:
  1. GWT Google Group posting about SEO and GWT. Quite an old thread, something may have changed from those times. People which are well-known to the GWT society are writing, such as Sanjiv Jivan and Ian Bambury.
  2. A whitepaper on GWT and SEO from BackBase. I've only started reading it, so can not yet tell my opinion.
Hope that SEO is applicable for such complex and cool applications as ones built with GWT :)

Tuesday, March 17, 2009

Spiral Universe school management system

I'm not interested in school software.

But this startup - Spiral Universe School Management System - seems to be emerging fast and might be interesting.

As usual, such systems tend to solve some specific tasks, like schedule management or class journal or medical log, but this system incorporates social network idea together with the school management tools.
Business model seems to be agile enough - you can use this system for free, but for additional services/support you may pay some money. Also, additional user space for media can be bought.

This startup seems to be fresh-technology-oriented. I found out GWT on Spiral Technologies page, so - here it goes.

Waiting for a live demo or a start to try this out :)

Monday, March 16, 2009

Forests-Forever are CLOSING

There are only two weeks left to watch out the gratitude of the forest all over the world.
I'm not promoting it - just give it a try.

There are really beatuiful pictures of the forests from all over the world. I mostly liked the ones from Ural, Russia. They give you a little of the fresh air.

While watching those photos I wanted to leave that job, get out of the city and just be there...

So, only two weeks left for Forests Forever. Don't miss it.

Getting UNIX to work

I like the UNIX operating system family. I would not switch to window$ now, maybe I could live with Mac (it's BSD, :)).
As you could hear before, UNIX can be hard to get working as you want it to work. It may require some ... actions from you :)
I got this picture from Kirill Scorp Malyshev ;).

Example of the job done with GWT

Just found out site I've been developing some time ago.
It's done with pure gwt, when I was completing it, it seemed to me veery cool.

For english-speaking guys: sorry for cyrillic :)

So, here's link: Металлопластиковые окна в Херсоне: ТМ Основа

It is very easy. It incorporates usage of the GWT history support (try clicking back button) , and it has some heavy-duty stuff done with GWT and modified Canvas widget from GWT widget library.

To see it, you can go to Посчитай!->Выбрать тип конструкции.

The wireframe pictures of the window shapes are drawn here, on the client! For Opera/Safari/Mozilla element is used, and for the IE6/7 - VML. Looks good, isn't it?