I problemi succedono a tutti, storia di SeeWeb

Scrivo velocemente questo post, anche se sono “impicciato” nei miei soliti viaggi, ma non volevo lasciare cadere nel vuoto questa cosa.

Non so se lo sapete, ma SeeWeb -uno dei piu’ grandi housing/hosting provider italiani- ha avuto un disservizio piuttosto “pesante” il 18 Gennaio in uno dei datacenter. La marketing manager di SeeWeb ha spiegato in un post l’accaduto:

https://blog.seeweb.it/a-quanti-anni-corrispondono-34-anni-ibm/

Giusto per evitare possibili fraintendimenti, premetto che conosco personalmente sia il proprietario di SeeWeb e alcuni dei suoi tecnici senior, ma non sono in alcun modo coinvolto nella società, ne ho rapporti se non di amicizia e come cliente.

Ho sentito tantissime critiche rivolte a SeeWeb ed alcuni dei suoi tecnici in modo personale (o peggio attacchi), sia sui social networks che in varie discussioni.

Come sapete, ho contribuito a molti progetti di grosse dimensioni, fino ad arrivare a gestire centinaia di migliaia di sistemi, e vi posso assicurare che shit happens, ovvero i casini accadono. Punto.

A tutte queste persone che hanno criticato vorrei far gentilmente notare che e’ facile puntare il dito verso qualcun altro senza non aver mai “rischiato” in prima persona, ne aver gestito un servizio, soprattutto a gente che al massimo avrà gestito qualche decina di server nella sua vita (se va bene).

L’unica cosa che posso imputare a SeeWeb e’ stata l’ingenuita’ di aver creduto ad un vendor. E’ una classica cosa che vedo fare, ovvero “tanto mi supporterà qualcuno”. E infatti IBM li ha supportati e ne sono venuti fuori, ma una SAN e’ comunque un single point of failure sullo storage. Ridondato quanto vuoi, se vanno giu’ entrambe le controller, ti ritrovi in questi pasticci. E vi assicuro che non sono gli unici e non solo con IBM.

Quello su cui invece vorrei far riflettere sono due cose, ovvero i vendor e i clienti.

Da una parte i vendor sono sempre piu’ legati al mercato e poco al prodotto. Una volta aver comprato IBM (o HP, Dell, Hitachi,….) significava comprare della qualità. Per chi lavora in un vendor non e’ una sorpresa, purtroppo si sa che si sono dovuti piegare alle dinamiche del mercato, ovvero non e’ il prodotto o il cliente che comanda, e’ Wall Street. Si aspettano ritorni ogni quarter o fiscal year, quindi c’e’ la corsa a “tirare fuori” sul mercato nuovi prodotti/servizi, a scapito di capire se questi prodotti sono veramente pronti per essere immessi sul mercato. Complice il fatto che molte cose siano fatte in software, spesso ho sentito dire nei vendor “lo sistemiamo dopo con un update” o con una fix hardware after market. E succedono danni come quelli successi a SeeWeb (disclaimer: non ho i dati tecnici e quindi non posso discutere il caso). Spesso quindi i problemi vengono risolti dai clienti stessi e dai ragazzi del supporto del vendor che solitamente sono volenterosi, insieme al supporto L3 (chi scrive il codice), che sempre più soffrono questa pressione.

Dall’altra parte vorrei fare riflettere i clienti. Esiste il tipo del cliente che non e’ del mestiere, posso capire che probabilmente ha comprato e non aveva esattamente gli strumenti per decidere. Questo cliente e’ nel caso che anche se va giu’ qualcosa magari non ti importa più di tanto, oppure se e’ pensa che sia critico per proprio business consiglio di affidarvi ad un vero consulente (e non quelli che fanno le “supercazzole”). Il caso diverso e’ invece quel tipo di cliente -o peggio al consulente del cliente- che ha delegato troppo all’infrastruttura e a terze parti, scordandosi che i problemi possono capitare a tutti, anche ad Amazon o Microsoft, oppure per far contento il cliente per “risparmiare” perche’ non c’e’ budget. Ultimo caso, il cliente ha una infrastruttura non critica per il proprio business e mette dove costa di meno, e anche se va giu’ ”sticazzi”.

Qualsiasi sia il caso, e’ sempre bene tenere in tasca un “Piano B”, che sia un piano di Business Recovery o Business Continuity dipende dal tipo di business che un’azienda abbia e che servizi sta erogando. Anche se un eventuale down non e’ critico per il proprio business, e’ bene sempre avere una copia off-site vendor-neutral dei propri dati, anche nel caso il service provider abbia gravi inconvenienti (mi ricordo una volta un camion entrato in datacenter con enormi danni). A fronte di qualsiasi danno, temporaneo o prolungato, il cliente ha sempre la possibilità di aspettare o decidere di ripristinare i propri sistemi anche da un altra parte. Se invece e’ uno dei clienti che ha bisogno dei propri sistemi sempre “up”, allora vi consiglio di avere una strategia multi-cloud (anche active-active) con le applicazioni che siano in grado di replicare i dati in maniera istantanea.

Non e’ difficile avere un piano di Business Recovery o Continuity, ne ci sono costi enormi, soprattutto se pensate a quanto vi costerebbe un down. La gestione del rischio e’ una forma mentis che purtroppo pochi hanno, cosa che ahime’ anche il caso di SeeWeb ha dimostrato.

A summary of 2019 and why I’m not running for OpenStack board again

It’s the beginning of the year… that season time where you can rest a bit and do -as most of you- a balance of what 2019 was to me.
I can definitively tell you that 2019 was a year full of changes and I bet that 2020 will be no different 🙂
Last thing first. This year I decided not to run as an individual representative for the OpenStack board of directors. In a previous blog post I spoke about the status of OpenStack as a project I would describe it as “dead man walking”.
OpenStack looked promising, but now Kubernetes was faster and was able to “take over”. Also, many enterprises are shifting to the cloud with a “cloud-first approach”, both as IaaS and for services (ex: SalesForce). That’s why I’ve seen many Kubernetes on cloud rather than on-premises.
This is the reason I joined SUSE and left shortly. You know how much I care about OpenSource and Linux, so I thought it was cool to “complete” my career with all the commercial Linux distributions available. In a few months in SUSE, I learned an invaluable lesson: OpenSource innovation is no longer in the hands of the big “open” vendors like RedHat, SUSE or Canonical, but it stands in the big IaaS/SaaS vendors instead. Look at what Facebook, Google, Amazon and Microsoft itself has contributed to OpenSource in the last years.
While I enjoyed the time spent with the colleagues in SUSE, It’s crystal clear that the market is going away from traditional software vendors and embracing more and more “as a service”.
There’s another lesson I learned, this time about myself. I “played” being an entrepreneur in the past years and it didn’t go exactly as expected. It didn’t go wrong, but I can’t tell you it was a success either. But it definitively was a success as a personal objective as I wasn’t sure I could make it. I learned a lot and understood that
probably I can do a better job managing companies than “alleged” entrepreneurs that manage startups. At the same time, I also understood my limits. I figured out that I’m able to run a company in a whole (sales, marketing, products, laws, …)  and keep it solid. However, I truly believe that being an entrepreneur goes beyond being a good manager and having a good product,  I believe that much is about having great connections and being a good influencer.
Will I try again be an entrepreneur or challenge myself for a C-level position in the future? Who knows, but meanwhile, I decided is time to move on and refocus  on what I’m doing best, i.e. acting as a trusted advisor for the big companies around Europe. I believe I’ve got the right balance between deep technical knowledge on many subjects and the communication skills needed to interact with upper management.
There’s a specific need in the market now. In a cloud-first approach, the selection and integration part between multiple on-line services and on-premises will play a key role. I believe better management of the IT budget -especially on the cloud- will be an hot topic future, and automation will definitively have a role in all of that. We will face a consumerization” of IT, especially in the user side,  and multi-cloud on the services side. Security will also play a key role, where the “zero-trust” approach and cloud identity management will slowly replace the traditional firewall and VPN.
London is in my heart and probably the closest of what I can consider “home”, but being a digital nomad will definitely be still a thing for me in 2020. So, I’ll see you around in Europe 🙂
Happy and prosperous 2020 for all of you.

Docker Containers & Security

Background

For those of you not acquainted with the latest trends in computer development, a docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application. This means a Docker contains code, runtime, system tools, system libraries, and settings.

In fact, a docker is a self-contained image that is easy to distribute. It’s also easy to move from development to test and to production. The movement process was key to the successful adoption of docker containers among developers.

With dockers, distributing applications is very easy and everyone can create and publish their own containers. There is even an official marketplace for docker containers (https://hub.docker.com/), which is managed by the open source community. Indeed, there are a vast number of official and unofficial ready for use docker containers, even for commonplace applications like databases and web servers.

Docker Security

A lot of security concerns have been raised about containers. Many of the security concerns are valid, but many misunderstandings occur because many people confuse containers and virtualization, thinking that they’re the same thing. However, virtualization involves the capability of segregating memory and CPU, whereas a docker shares the same resources.

Last month a security researcher found a way to “escape” the jail that docker creates and replace arbitrary programs in the host system. A malicious program or docker image can exploit a bug in the docker runtime (runC) to gain root privileges on the host running the container. This then allows ill-intentioned players unlimited access to the server as well as any other containers on that server.

The risks are quite clear.

What can I do? Be Very Careful.

Developers and Operations have to select docker images with extreme care. The way to exploit the aforementioned bug is through malicious and hidden programs that -before launching the real program- execute the exploitation sequence and inject an evil binary.

Most of the time images are built in-house for custom-made applications, but sometimes 3rd party images are uses for common tasks such as databases, caching (ex: Redis) and front-end (ex: haproxy). DevOps should avoid images from unknown/untrusted sources.

If you can, create your own images, even for common applications, starting from well-known and trusted sources, such as the operating system itself or from the application vendor.

At present, I am helping many of my customers to introduce secure building pipelines to their architecture. By embracing DevSecOps, and thus driving development and automation in a different way, you can improve the overall security of in-house applications. At the same time, you can ensure the code provided by suppliers is up to scratch. In this way, the use of dockers can indirectly help maintain code quality.

Use dockers by all means, but implement them carefully to avoid potential security pitfalls resulting from poor coding practice.

If you require specialist advice on building secure pipelines or security with docker, get in touch with me.

Is OpenStack still a thing?

You know how much I care about OpenStack and how deep I feel involved in its community. A recent experience in the Netherlands at a premier customer made me think about OpenStack as a whole.

1024px-The_OpenStack_logo.svgAs I said at a conference last year, probably the most involved consultant in Europe, and yet I’ve seen more failures than success. Most failures were due to the lack of expertise in OpenStack and deep knowledge of Linux, protocols and OpenSource in general. I initially thought that the skills I have are quite common and that there’s plenty of people capable of running an infrastructure, but apparently, I was wrong.

Even big brands have usually 1-3 great engineers and the others are on the average. This is not really a bad thing, but you have to have great skills to manage OpenStack and most of the time management can’t rely on a bunch of guys for their business. Many decided either to go to public clouds and some went back to VMWare because skills are easier to find.  To be honest, as an entrepreneur, I can’t blame them.

Public clouds (AWS, Azure, Google, …) are easy to embrace and you don’t have to maintain hardware, storage, network and is very attractive to those customers where IT is not their core business. Public clouds might seem costly at the beginning, but if you look at the real TCO (including labour cost), then you find out is not that much.
And if you are concerned about your privacy, a good VMWare cluster is enough for most of the businesses.

Kubernetes quickly ramped up into developers radar in the last year. It’s “cool” and containers are great ways for developers to distribute their applications. At the end of the day, companies need to run their applications to make money or support their business. How they do it, they don’t really care.

In my humble opinion, Kubernetes is not mature yet, especially in the networking and storage, and still lacking multitenancy. But is slowly getting there. Kubernetes is not that simple to manage, but it’s way less complex than OpenStack … and you don’t depend on MySQL or RabbitMQ to operate (which is a real pain). So what’s the need for OpenStack then?

This is the question I’m asking myself. Probably the number of use cases for OpenStack is quite small now, mostly related to telco operators and NFV.

The only thing that Kubernetes is not capable of is Microsoft Windows app, but Microsoft has shown interest in porting its apps to Linux (see SQL Server for example), not mentioning they are actively contributing to Helm.

While I still love OpenStack, we need to face the evidence that the interest in OpenStack is slowing fading away. However, its legacy has been invaluable to me and the community as well. The “Software-Defined” revolution that OpenStack brought, as well as the mindset around automation is the base for the future steps of IT.

An era has ended: SecurePass shutdown

shutdownGARL announced that SecurePass would have ceased its official activities in August 2017. As of today, I shut down all the virtual machines of SecurePass.

I am a bit sad, but there are choices you have to make and sentiment sometimes is far away from the business.

This definitively marks the end of an era, but a new one is showing up.

Project “simplification” for 2018

shutterstock_64028797-634x0-c-defaultSince the beginning of 2018, I started an “internal” project whose ultimate goal was to simplify my life. 2017 was definitively a stunning year, with a lot of great projects and great results as well. I believe that will be difficult to achieve the same ever again. With great results, however, comes also great sacrifices: it was all about work and there was little space for my own life. “All work and no play makes Jack a dull boy”, the proverb says, so I believe I deserve a little relief from the big pressure.

My new year resolution was to simplify my life and have a better work/life balance. This “simple” resolution turned out to be more complex and harder than I thought. Since January, I worked really hard to reduce the number of hassles as much as I can. This is the main reason why you haven’t seen me around and I wasn’t very often involved with social media, events, etc…

At the end of June, I can say I’m on the right track, but a lot has yet to come. Standby for some great announcements 🙂

Alicloud & RedHat Linux 7.4 BYOS

alibaba-cloud-logo

Alibaba Cloud (Alicloud or Aliyun) is a promising Chinese cloud provider that is becoming popular in the Asia-Pacific region. If you want to release services in China and be able to comply with Chinese privacy law, all your data need to stay in China. For this reason, Alicloud can be handy to start your journey in the Asian country.

Most businesses want to have the same certified workloads in China as well, and those are mostly based on RedHat Enterprise Linux (RHEL). Alicloud is a RedHat Certified Cloud Provider and offers RHEL images in their marketplace, but these images include a RedHat subscription. What if you have an Enterprise agreement and you want to use a Bring Your Own Subscription (BYOS) method?

Here are some handy tricks to bring RHEL 7.4 BYOS into Alicloud and start serving your customers in China.

Alicloud supports importing images in RAW and VHD format, which help us a lot. If you have an active RedHat subscription, you should download the RHEL 7.4 KVM guest image (see image below). This image is compatible with the Alicloud virtualization system; Alicloud is also compatible with cloud-init to customize the virtual machine at boot time. The direct link to the download page is here: https://access.redhat.com/downloads/content/69/ver=/rhel—7/7.4/x86_64/product-software

rhel guest.PNG

The next step would be converting the QCOW2 image into a RAW format. However, the conversion will expand the 500MB QCOW2 image into a 10GB RAW format. Uploading such a big file would be problematic if you do not sit in China and you have to traverse the Great Firewall of China.

As such, we will upload the QCOW2 image into Alicloud  Object Storage Service (OSS) and convert it using a temporary virtual machine in China. Create a bucket through the console and upload the image. Shall you need a GUI to perform the upload, an official GUI client named “OSS Browser” is available here: https://github.com/aliyun/oss-browser/blob/master/all-releases.md

I strongly recommend downloading also ossutil64, a CLI based tool for OSS, to be able to upload your image from the temporary Linux instance. The tool is available here: https://www.alibabacloud.com/help/doc-detail/50452.htm

Create a small Linux instance with the distro of your choice (I recommend CentOS) in your Chinese region (in my case Beijing), but ensure you have sufficient disk space. Once the instance is reachable, login and download the QCOW2 from the bucket using curl and the object URL. Convert it using qemu-img tool:

qcow-img -f qcow2 -O raw rhel-server-7.4-x86_64-kvm.qcow2 rhel-server-7.4-x86_64-kvm.img

Once converted, use the ossutil64 to upload the image to your previously created bucket.

Object Storage Service 1.PNG

If you click on the file, you can get its public URL in the preview. Copy the file URL as we will feed it into the image importer,

Object Storage Service detail.PNG

Go back to the Elastic Compute Service (ECS), select Image on the menu on the left and start the import through the “Import Image” functionality. In the OSS Object Address, insert the URL as copied before. Use Linux as operating system and RedHat as system platform. Mind to specify RAW as image format.

import image1.PNG

import image 2.PNG

The Alicloud image service will (slowly) import the image. If everything is successful, you should see an image similar to the one below:

image2.PNG

You can start a virtual machine with your newly created image and register your RedHat subscription with subscription-manager 🙂

Outside of “The Net”

cord-cutting-hype-100648271-carousel-idge

I’d wish to share with you something that recently happened to a friend of mine couple of days ago. He runs a small cloud provider and acts as an outsourcer for his selected customers. A very big firm in his country decided to move his brand-new website to one of his datacenters.

He runs two datacenters for disaster-recovery and business continuity.  Each one of the datacenters has its own provider independent IPs, different ASNs and different upstream providers.

What happened is that, once he moved the new website, Google has delisted the website from its search engine.  Absolutely no evidence of this company when searching excepts for its famous products on the Amazon marketplace. No need to say that the marketing of the customer and the developers were blaming my friend.

After an initial investigation, Google failed to retrieve the robots.txt file that is needed to index the website, so it decided to delist its website. Funny enough, other search engines (es: Bing and Qwant) were able to retrieve the same file. On access logs and tcpdump, no sign of the Google crawler.

During a test, he was able to “restore” the situation by moving the complex website with its e-commerce platform to the other datacenter. A deeper investigation revealed that -for some unknown reasons- Google seemed to have blocked the ASN IPs, while other search engines and the rest of the world was able to access the website. While contacting the Google NOC, they said that Google search engine and webmaster tools are unsupported,  so basically my friend was on his own. For the unknown reason, after a couple of weeks, the ASN IP of the datacenter were reachable again.

This reminds me of my previous posts in which I told about how the Internet has been designed to be as much as possible independent from a central point, while the information is now more and more centralized to few companies.  Of course, there is no malicious willing from Google to block my friends IPs, but it turned out that one of these companies have the potential power to decide if you can run your business or not.

The same thing could potentially happen to a public cloud provider: what if Amazon decides to shut down your machines (and it has the right to do so!)?

I’m not against any cloud provider and we need to thank AWS and Azure for bringing such an inspiring innovation to the world of IT. But, as I stated in previous posts, we need to be ready to bring back our business on-premise if forced to do so.

Just a couple of hints:

  1. Create your local micro-cloud on-premise, say with OpenStack and Kubernetes, so that you can start and scale up quickly
  2. Use open data and open standards and avoid any layered product that is offered by the cloud provider, it will lock you in.
  3. Automate deployments as much as you can, so that is reproducible and can be run on-premise

The idea I’m currently advocating is to apply the Raiffeisen model to IT to foster a complementary alternative to public clouds and big outsourcers so that heterogeneous enterprises in a local territory can team up to create a small micro-cloud and save.

Mia moglie vuole lo scontrino: una analisi dell’adozione cloud in Europa

I miei personali obiettivi del 2018 sono la semplificazione e la riduzione del “disagio” quotidiano. Una  e’ il proliferare di scontrini che si moltiplicano come i gremlins: la “collezione” di scontrini ormai a casa rasentava un livello inaccettabile.

Qualche giorno fa ho installato a mia moglie l’applicazione di un famoso supermercato, visto che offe la possibiimg_20130228_131815.jpglità di avere degli scontrini virtuali. All’atto della spesa, il supermercato in questione ti crea immediatamente un PDF, che e’ consultabile sia tramite app che tramite sito Internet.

Anche se l’applicazione e’ molto semplice da usare, dopo qualche spesa fatta in autonomia, mia moglie si e’ arrabbiata: “come si usa questo coso” e “non posso vedere quanti punti ho e se hanno sbagliato”, ha detto. Anche se bastava semplicemente guardare sull’applicazione, praticamente mi ha costretto a disabilitare la funzionalità dello scontrino virtuale: l’abitudine dello scontrino fisico ha vinto.

Vi chiederete: bella storia, ma cosa c’entra con il cloud?

Beh, e’ che nelle mie molteplici consulenze, con alcuni tipi di clienti alcune abitudini di avere “qualcosa di fisico” non muore.

Nel 2017 ho fatto un grande lavoro -insieme al mio team- per portare una piccola banca di affari di Londra totalmente su Amazon Web Services. Non avendo personale IT interno, ma soltanto persone che si occupavano del supporto desktop, l’idea che avevo avuto era di eliminare qualsiasi hardware on-site che non fosse strettamente necessario a far funzionare i desktop stessi. Se qualcosa si rompe, qualcuno deve metterla a posto, no? Se non c’e’ nessuno, chi sostituisce (ad esempio) un disco????

In realtà il management era molto favorevole a non avere “rogne” di gestione, quindi passata la “forca” del legal & compliance, abbiamo proceduto lentamente alla migrazione, facendo attenzione che non si “rompesse nulla”.

Server-relocation1Ora, a distanza di poco piu’ di un anno e completata la migrazione, il cliente ha chiesto di tornare indietro. Non per problemi tecnici, ne’ per problemi di performance. Con una linea veloce e ridondata, e a pochi hop da AWS, la sensazione era come essere leggermente piu’ lenti dei server locali.

Quindi qual’e’ il problema?? La paura di non avere piu’ i dati “nello sgabuzzino” e di perdere il controllo ha innescato un meccanismo psicologico al CEO che lo ha portato a prendere la decisione di tornare indietro, pur con un TCO più elevato e con la gesione dei possibili fault. Vorrei farvi notare che sto parlando di una banca della city di Londra, non dell’officina di “Zio Tonino”.

Cosa mi ha insegnato questa storia?

Mi ha insegnato che la tecnologia ci da a disposizione una infinita’ di strumenti e di possibilità, ma alcune mentalità sono veramente difficili da sradicare.

Piu’ vado da clienti in Europa e piu’ sto assistendo ad un vero e proprio paradosso. Con l’avvento di fibra e link radio ad alta velocità, le PMI Europee che maggiormente trarrebbero vantaggi dall’uso del cloud, sono quelle che sono piu’ restie al cambiamento. Al contrario, le grosse aziende che potrebbero fare economia di scala con l’adozione di un private cloud, oltre ad avere maggior controllo sulla sicurezza del dato, si rivolgono invece al public cloud (AWS, Azure, Google Compute Engine) perche’ cosi’ hanno “meno rogne” nella gestione del ciclo di vita dell’hardware e nei processi interni.

Cosa possiamo fare noi consulenti quindi?

La mia esperienza come entusiasta su Linux mi ha insegnato che le guerre di religione non servono a niente, ed -in fondo- e’ il cliente che paga. Il nostro ruolo e’ quindi quello di consigliare al meglio il cliente a seconda di quello che vuole fare.

Mentre aspettiamo che alcune tecnologie vengano “digerite” meglio, ho visto che una strategia vincente per chi vuole l’hardware on-premise e’ quello di offrire i servizi cloud sia per la parte di front-end web (ragioni di immagine), ma soprattutto quella di offrire la possibilità di avere un disaster recovery veloce, rapido e a basso costo.

Dall’altra parte, invece, possiamo proporre a chi ha tutto in cloud, la possibilità di creare un micro-ambiente interno su cui poggiare l’infrastruttura, ad esempio con un private cloud basato su OpenStack con soli 3 nodi, un object storage per il backup o un sistema Kubernetes/Docker, tenendosi pronti a “scalare” con automatismi quando “in emergenza” dovremmo accendere i sistemi in casa.