About a month ago, I started my new job at a fresh startup. It’s self-funded but with a cool idea that addresses a particular industry’s needs. I’d like to talk about my experience but without expressly talking about the product itself, so for now, visualize it as a super-customized version of Dropbox for a particular industry. The market is solid, the idea is solid, and the feedback is solid. All those things made me join as the sole engineer and take on the responsibility of creating the first prototype and bringing on the initial architecture.
First, let’s discuss the stack. I decided to go with Node on the back-end and Express as the light-weight framework for middleware, routing and the server itself. The back-end is largely an API data layer rather than a full-on MVC-style crud (or whatever else). It’s complemented by Angular on the front-end styled with SASS and a custom framework.
The app has an OCR layer as well using a piece of software called Tesseract. All of the information is stored in PostgreSQL in various formats.
I decided to go with Node because of several different reasons. One of the main ones was my first-hand personal experience with it which allowed me to rapidly prototype. This was one of the requirements for the job (move fast, break things, that whole thing). I’ve also used it in high-performance environments where scale was a big deal so I knew that, at least for a data layer, Node would work very well. But so far, I’ve run into some problems.
- Development Speed – with free structure, easy package management, and server reloading, you can’t go wrong with Node for a prototype.
- Modular Applications – with no namespace issues and the
requiresystem allowing pure functions to be shared across various parts of an app or multiple apps, you’ve hit a jackpot. I combined logic from the OCR and my API system seamlessly. It’s definitely much easier than in other languages.
- Packages – huge package ecosystem. Most problems have multiple solutions.
- Isomorphism – the ability to share back-end and front-end logic. When I’ll eventually migrate to ReactJS, this will be more apparent.
- Fluid structure – it’s very easy to setup an app customized to your needs. Don’t need a view layer? No problem. Don’t even need models for your DB? Not an issue!
- Orphan and obsolete modules – the number of abandoned and out-of-date modules is staggering. Half the packages I tried to find to solve a niche problem were 2-3 years old, unsupported or simply not working (or, better yet, no docs).
- Missing libraries – a lot of CS stuff and non-web stuff simply has no support. Node is hot and people use it as a data layer first and foremost. And it shows.
- Async is a bitch – yes, it’s 2015 and I’m still bitching about async. No matter how you try to control the flow, whether with promises or async or threads or whatever, async is still a problem and most “solutions” feel like duct tape. I’ve learned about streams, I’m using them and it just feels wrong to have to go to these lengths to do something as simple as make a couple of DB calls.
- Evented system – it’s not what it’s cracked up to be and everything “async” is not a solution. Unless you’re offloading computation or work to another piece of software, NodeJS can be as blocking as anything else. Just because you stick
asyncin front of it won’t make it work better. What you think you’re getting is parallelism (because that’s what everyone makes it sound like), what you’re getting is concurrency. Read cool SO answer on this matter.
Honestly, I want sync back. Most of the stuff you do with web work is just: intake request -> get data -> process -> respond. And only one of those steps is going to be async and it’s not really that helpful.
To be honest
I’ve become largely disillusioned with Node, especially with my OCR work on it. Having to write my own libraries for something that most other languages have a standard sucks, but when I do write it, everything has to be pretty much async which creates callback hells, or you use something with
sync in the name, and then you just feel like you’re doing shit wrong. Think about ImageMagick, Leptonica, Tesseract, or other software like that. Node just doesn’t deal well with it.
And if it does, the developers that built the wrapper libraries are gone, probably coding in Go. Having implemented only a quarter of the available APIs.
But outside of this minimalist application, I’m having trouble trusting Node anymore.
I have a love/hate relationship with Angular. So far, my biggest Angular project has been Foundation for App which I architected and developed most of it (though, I haven’t touched it in a couple of months!) and it taught me a few valuable lessons in Angular:
- Digest cycle sucks. Sometimes it’s running, sometimes it’s not. And sometimes when it’s running, it has no idea anything changed.
- If you don’t do it the Angular way, you’re going to break something.
- Moving data between various depths of scope sucks as well.
- Angular can be god-send when you figure out how to make reusable directives.
My current app uses Angular to drive everything on the front-end, from routing to permissions to everything else (backed by the back-end of course).
- Development speed – Angular is easy to get up and going and running down the road. It’s unobtrusive in the prototype stage
- Built-in directives are awesome – from
ng-class, Angular allows me to enforce most of required logic out of the box.
- Back-end on the front-end – it’s wonderful to have a framework that allows me to mimick the back-end.
- Service/Factories/ and all of that – are awesome. Separating logic out in these small modules makes the application feel cleaner and work much better. I rarely copy logic these days.
- Directives and Controllers – I’ve built out a few basic components and built extensions on top of those which allowed me to save a ton of code. Imagine a modal, and a “okay/cancel” extension which allows you to inform a controller of what action took place! I built a series of these “extensions” which helped me reuse basic UI elements but fit my unique situations.
As my lessons indicate:
- Digest cycle – it’s often difficult to figure when exactly it’s running and what triggered it. I often encounter the dreaded
Error: $digest already in progress.
- Two way binding is great, in theory – Two way binding doesn’t exactly work that way. I’ve noticed that Controllers will often “forget” to inform directives of variable changes (so you have to setup
$watch), and directives rarely let their parent know what’s going on. Honestly, this is why (and point #1) people have started to love Flux so much.
I’ve never work with an OCR before so that’s been an interesting experience. Tesseract (my OCR of choice) is pretty simple to use once you get the hang of it but what’s interesting is that it’s niche enough that it doesn’t have that big of a community support.
With Node, PgSQL, and other parts of my stack, it’s easy to run into knowledgeable people willing to help. From subreddits dedicated to them to tons of questions/answers on SO, and tons of tutorials, you have a wealth of resources. Tesseract on the other hand has little to no tutorials, no community, and no one to turn to.
It’s been difficult and I had to actually setup a call with an OCR expert just to wrap my head around it.
It was a sad moment and through my OCR adventure, I stumbled into these issues time and time again. There were tons of concepts that simply do not have a “quick start” or “beginner’s intro” to them. And no articles to help you cut through acamedia. This involved image manipulation, various image algorithms and so on.
Honestly, I might have to publish these resources on my own just so that the next guy who wants to know how to setup OCR and what “training files” are can have some something to look at.
I’m hoping to post some more impressions from my job. The longer I work with these technologies, the more interesting everything gets, and the more “ahha! This is amazing!” moments as well as “Oh shit, this is terrible” moments there will be.