What does it take to nearshore SW development to Greece?

22 08 2012

I asked this question to the Greek IT network on LinkedIn

I was wondering what is stopping Greek SW developers from getting organised and offering nearshoring services to European companies.I am looking for people for our Indian office and the salaries are comparable to what I expect to find in Greece and certainly higher than Romania, Bulgaria or Serbia.From my experience, I would say Geek developers don’t get the chance to work on full scale projects through the whole lifecycle. So I suspect there is a lag in professionalism, at least for the freelancers.I had the opportunity to work on my own project end-to-end from development to sales and I’d say if you’ve never gone from product to client it’s difficult to grasp the whole thing.

I then asked the respondents to fill up a survey that would show us what are those factors that are limiting the nearshore activity in Greece and here are the results from the 30some responses (until 16/08/2012):


And the big one:

Three factors have similar percentage (~50%) and these are lack of confidence, short term strategies and stability in the country.



Innovation showcase: Norton commander clone

16 02 2010

This is great! With windows 7 you don’t just get DOS under the hood but also Norton Commander!


4 01 2010

This is my first iPod. An old green iPod mini with 4gb of space I bought a long time ago (2004). It still works great. This is my newest one: an iPod touch 3G released in October 2009. It is the smallest model with 8gb. I bought the first one for 200 Euros. I bought the second one for… 200 Euros.  I consider the latter overpriced due to the competition and actual cost of components. There’s probably a huge profit margin there. The two devices have one thing in common: the awful iTunes software. The “see nothing, hear nothing, say nothing, do nothing” implementation lives on. Apple could spend a penny or two to have it refurbished though I think they don’t do that on purpose. Besides the fact that the iJunk piece of software is the same for so many years, a resource sucker with little – if any – added value, I cannot install it on my WXPx64 machine! That platform is not supported, for some obscure reasons. Not even the 32 bit can be installed. Just terrible.

Apart from that the new device works smoothly and nicely. I got it in order to play with application development.  Didn’t try anything yet but the platform is very … proprietary… Applications can only be distributed through App store for a price that the developer has to pay. Let’s see…

So it seems that as with most of the very promising things you get little in the end. What you can do with the device is limited. There is no possibility to use external cards (for example SD) to extend the memory or see photos and the like. There is no card slot and the only possibility would be to get a bulky dock station with slots. You cannot very easily transfer files (like pdfs) to read. All in all a proprietary platform in a rustique packaging that limits itself  by vice to doing only a few things. I just hope android devices are better than this…

Data at the core of the business

13 05 2009

Experimental engineering software projects start with some urge to have a prototype  where you can “see” or “do” something: see a finite element mesh, do a specific calculation. Eventually the prototype gets to be a full-featured application and do a whole lot of things. In the beginning, the prototype would read a simple text file with a parser or an XML file. You thought this would be enough for the years to come, but the next time you check the requirements have grown so much that the application is overwhelmed with data and data needs. On top of that it’s as slow as if you’re back to a 386DX with math co-processor. Now you have to switch to binary. And then it does not work on Windows<->Linux. Developing a full fledged platform independent binary format is not a bad option but then it does not version on its own, then comes the large file support, different clients need to access the data, you’ve hard-coded business to work with a particular file and now you want another one, 64-bit support and other exotic platforms, multi-threading, and the list goes on and on…

Eventually every such project finds itself revolving around data. And it’s normal, data is a significant and invaluable part of any engineering or scientific software project. It’s not easy to write a good persistence library. I’ve done it in the past and there’s a whole lot of tricks and catches. Usually home made solutions suffer of poor scalability and robustness, as well as performance. It’s natural to ask one’s self “what do other people do”? Like finite element people, scientific projects with tons of data, research centres and so on. There’s a limited list of projects to select from Open Source. I have tested a few in the past and ended up enjoying working two: the Metakit and HDF5. I plan to test Hadoop at some point as well. For the moment I will stick to HDF5.

Why HDF and why 5?

HDF5 stands for Hierarchical Data Format 5, though I only know versions 4 and 5, no 1,2,3… In times where Open Source projects have all kinds of fancy and exotic names, one can tell the age of the HDF5 project… You can smell the 80’s academia from miles, you can even see some girls wearing shoulder pads there in the back as well… I myself used it for the first time in the late 90’s while emitting my own fragrance of academia. However, the old-timer is quite up and running. Now it is managed by a private company, spun-off like everybody else, and to me it seems that now it’s the time for HDF5 to show it’s strength. Obviously the people behind HDF5 have put quite some effort in the last decades to come up with the right solution at the right time.

So what’s HDF5 after all…

HDF5 is a library that provides a convenient API for persisting scientific/engineering data. Scientific data is regarded here as large arrays (enumerating millions of elements) of primitive or composite types. These arrays can be multidimensional. The HDF5 API provides a way to describe this data, arrange it into a hierarchical structure and access it. In this respect, it is self describing, in the sense that the way the data is accessed in the file does not depend on some application defined object tree. Rather, a format is described and this description is stored along with the data within the file. To make this more clear, one may write an application (though there are there already a couple, like HDFView and HDFExplorer) that can open, traverse and access any HDF5 compliant file, created by any other application.

The data within the file is arranged in a tree-like structure consisting of nodes. The tree might form a closed graph. Each node or leaf of this tree must be a valid group. Each group can contain datasets. Datasets are the detailed description of the data along with the data itself. Simple examples of datasets could be an array of n doubles or an array of n structures consisting of several primitive (or integral or atomic) types. The HDF5 API provides the capability to traverse this tree and access each of the datasets. However, one does not need to access the whole tree to look up a dataset: this is done using persistent addressing. Persistent addressing takes the form of a path. Therefore the user simply needs to access a dataset by path to extract the information needed.

If this sounds complicated, consider these alternatives: An HDF5 file looks like a Unix file system: there’s a root, directories are called groups and files are called datasets. You can even have soft and hard links. The path composition is identical to Unix paths: the path /Foo/Bar/… would give access to a particular group at a particular level. The path /Foo/Bar/MyPreciousData could provide access to a particular dataset. Another way to see the HDF5 file is like a binary XML file. It’s just like XML in the way data is hierarchically arranged, only much richer in the complexity and functionality. But in principle you can map any XML file to HDF5, while it’s not always possible the other way around.

Since datasets are primarily multidimensional arrays of numerous elements of identical types, it is usually faster to process them in batch. HDF5 is optimised in performing IO of chunks of data. However, it also provides the capability of accessing subsets of the datasets, using the hyperslab concept. Hyperslabs are masks that can be defined using binary operators, therefore providing the capability of extracting particular information from the dataset. Patterns defined this way may extend to all dimensions of the array.

How is HDF5 related to a Data Base Management System?

HDF5 is similar to a Data Base Management System (DBMS) in the sense that it provides the means to define the organisation of the data within persistent storage as well as data structures to deal with very large amounts of data. It also satisfies to some extent the Atomicity, Consistency, Isolation, Durability (ACID) properties. It also  defines an extensible data model that does not depend on any client application and therefore is not affected by application requirements and their changes.

But any similarities stop here. HDF5 is different to a DBMS in the sense that it does not provide a database query language and report writer, as well as a security and a transaction mechanism that can handle multiple simultaneous accesses to the database. HDF5 operates on single or sets of files through the use of the library’s API. It does not provide an application that  accesses these files. This is the task of the client to implement. Therefore security and simultaneous access have to be handled at that level. But what’s important to understand here is the different scope of the two systems: a database handles efficiently large numbers of transactions consisting of small pieces of data. HDF5 handles one or only a few transactions consissting of large amounts of data. It is important to identify the right tool for the right task.

How does HDF5 compare to serialization?

I would classify HDF5 on the persistence side. But first of all, what is serialization and what is persistence? I would say that:

Serialization is collapsing a multidimensional object tree to a one-dimensional array of bytes. Even more important, a serialization framework can get that array of bytes and reconstruct the object tree and restore the exact state at the moment of serialization. Random access to the data is not possible. It’s all or nothing. The main consumer of serialization is the application that serialized the data in the first place or some descendants of that, in the latter case making versioning an important issue.

Persistence is mapping a multidimensional data structure to another multidimensional data structure in an application independent manner in order to persist that part of the state that is needed by other applications in space and time.

HDF5 compares little to serialization frameworks.  It is rather used in an orthogonal manner: its main task is to persist data. It does not generate a one-dimensional mapping, but rather a multidimensional mapping. Random access to the data is possible. The data is supposed to be consumed by applications other (but not excluding) the one that created the data without the need to share business objects. Versioning is not an issue, as the data is self-describing. Further, serialized data can be persisted using HDF5.
Some major advantages

  1. HDF5 operates on single local files that can be transferred using OS commands.
  2. HDF5 provides the means to arrange information in a hierarchical tree and access it randomly independently of the application that generated the information.
  3. HDF5 provides a dynamic mapping paradigm.
  4. Performance-wise, it is often faster than native calls.

So are there any disadvantages?
The main disadvantages are:

  1. It is an open format and self describing, therefore making the data transparent and completely open. This could be a drawback for proprietary applications. Actually it is one of the major ones.
  2. There is no simultaneous access support at the dataset level. This means that it is not possible to write to the same dataset from multiple clients and read from multiple clients at the same time.
  3. Locking/unlocking, journaling, composing transactions are things that are either not implemented yet or lie completely on the application side.

All in all HDF5 is an extremely rich library that offers the versatility  engineers and scientists need when it comes to writing their data. The data itself are invaluable and have to be safely persisted. Sometimes data are more important than their source. Consider for example scientific experiments that cannot be repeated very easily. HDF5 makes sure we find the data were we put it in the first place.You can access that using nice bindings for C, C++, Java, Python, there’s something for ruby I think and emm.., what was it now… oh, Fortran! COMMON, EQUIVALENCE, DIMENSION and the like.

Next time I have a little time I’ll write some examples on how to use HDF5. It’s pretty intimidating if you haven’t seen OpenCASCADE, but one can put it fast in action.

Quote: I want to program but I am an engineer…

10 01 2009

Once again, in one of the threads I follow, I read yet another engineer writing “I am only an engineer and have no formal software development education… how do I do this or that?” I read the same from engineers and scientists from time to time, that confront software development as something detached from their education, background and day-to-day practice. For me this is astonishing for two reasons:

1. How can it be that these people never got to program anything?
2. Why are they lead to believe that software engineering is so different to any other kind of engineering?

I will start a series of posts related to the topic “Software Engineering for Engineers”!