Wednesday, March 13, 2013

Portecle - nice tool to manage certificates, keystores, keys, etc. Instead of keytool (jdk)

I was using keytool from jdk installation always. However, when it came down to bouncycastle implementation, too much work should be done to make it running - download a crypto provider, install it, etc. That's the description: http://stackoverflow.com/questions/4065379/how-to-create-a-bks-bouncycastle-format-java-keystore-that-contains-a-client-c

As Android uses bouncycastle by default, had to pack a public key of the RSA pair to the BKS store to make a resilient HttpsUrlConnection.

Found it very nice and handy to use http://portecle.sourceforge.net/ as a GUI tool.

Works flawlessly, recommended

Tuesday, March 12, 2013

Linux/mac alternative to Fiddler - WebScarab!

I really enjoy using fiddler - that is web logging proxy, or, the way it is called - a web debugger.

It works very easy way:

  1. When started, modifies the Internet Explorer proxy settings to point them on  itself.
  2. All the request are logged and ready to be analyzed then. 
  3. As much of the software does use IE proxy settings for its own setup, non-straightforward things as SOAP calls, Ajax, embedded-browser calls may also be caught. 
  4. With the captured interchange, one can analyze every aspect of a single roundtrip.
The only thing I dislike about it is its nature - .Net built, it works flawlessly on MS platform, but not on Linux.

The good and working alternative I have found is webscarab project, which is written in java, looks pretty much the same, and does most of the same job. Easy thing. Just run it with java-jar and setup your software to proxy through 127.0.0.1:8008.

Monday, March 11, 2013

Web-harvest scrapper 2.1 - how to use

Often it happens that you need to get some data from a web source, but the developers of the site disallow or simply do not have resources to implement any b2b API.

That's where scrapers come in. They are software pieces acting as a browser but providing some programmatic API to process results in a program code.

During my search for an acceptable solution, I came over the http://web-harvest.sourceforge.net/ ; nice features are:

  1. All-java, used as a library
  2. Great versatility - it is packed; really. Xquery, Xpath, regex searches, emulation of browser activity, templating and variables, different script integration. 
  3. Nice UI workbench, which allows to develop scripts easily and see the results immediately. Then, just save the XML configuration and invoke it from the code then. 
However, the version under link above is 2.0 version. It is not present in Maven, and I am building with maven. However, there is a new 2.1 version, which is highly redone - maven build process, switched to Guice injection, etc. 

I fancy 2.1 alot, but there are some issues with it - NO documentation at all, a little different behavior. I made a fork for myself on GitHub (https://github.com/lexaux/web-harvest) - and applying changes there. Hopefully, will be able to contact developers and contribute. 

For now, a quick how-to on running the 2.1 web-harvest scraper in your code (UI is pretty straightforward). It is really different from 2.0. So, here it goes: