Skip to content

Add s3 reliability test#44

Open
lukesteensen wants to merge 2 commits into
masterfrom
reliability-tests
Open

Add s3 reliability test#44
lukesteensen wants to merge 2 commits into
masterfrom
reliability-tests

Conversation

@lukesteensen

Copy link
Copy Markdown
Member

While far from perfect, this works to spin up the equivalent reliability test to the old test env repo.

I'll comment on things of note, but I'm curious how this looks overall.

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
---
- name: Install Vector configuration
template:
copy:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to get around an issue where template tries to fill in the vector config's templates. We can probably get around it with escaping, but this seemed safer.

It also opens the question of, if we actually wanted to template some things in, how would we do it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! We can use different variable_end_string and variable_start_string when resolving vector templates.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, @lukesteensen we use templates to insert addresses, ports, etc, so we'll need to revert this. I'll try to submit a PR today that changes this to use different variable_end_string and variable_start_string values.

Description=verify-logs

[Service]
ExecStart=/usr/local/bin/verifiable-logger verify file-to-s3-reliability-test-data us-east-1 --prefix "host=%H" --tail

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty hardcoded right now, but should obviously be templated in once we figure out the right way to get the variables in here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would highly recommend not adding support for environment variables in Systemd. I also ran into weird issues when I passed more than one flag. It was a mess. But if you look at the http_test_server role I used a template for that service file.

Comment thread bin/test Outdated
You can run this test via:

```
test -t file_to_s3_reliability

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing missing here: how do we shut down the test and destroy all the resources?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should add a bin/teardown -t <test> -c <config>. The terraform destroy command should make this easy. It should only be called for the local test state, not global state, obviously.

Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.

What's wrong with CloudWatch? I think it works quite well for this.

@MOZGIII MOZGIII Mar 26, 2020

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CloudWatch works!
But if we create resources per test - then if we have bin/teardown, we have to invoke it with enough context so that it only deletes resources associated with a particular run.

So this is where we use lambda: we can assign the invocation of bin/teardown with all required context, delayed by, let's say, three hours. We will then schedule this lambda invocation as part of bin/test run. It'll allow us to not only clean up the VMs, but also all the associated terraform state - policies, S3 buckets, VPC, and everything else that we create per-test.

I think this solution can replace our CloudWatch VMs removal because we can clean up everything, including the VMs!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should solve it for now: #52

Comment on lines +11 to +12
region = "us-east-1"
bucket = "file-to-s3-reliability-test-data"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, these should be templated in somehow.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this stuff being hard coded since it's all contained within this test. We could use variables (this is what the configurations folder is for in the test, but I don't think it matters too much.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is S3 bucket names have global scope, so we either have to use a different one each test run, or make this part of the state global. I think making it global is worse than templating the names.

@@ -0,0 +1,2 @@
---
foo: "bar"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't let me not have this file.

Presumably, we could use this for things like the bucket name but a few things weren't clear:

  1. How to template into the vector toml while leaving templated fields alone
  2. How to get this same variable into terraform

@MOZGIII MOZGIII Mar 26, 2020

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. How to get this same variable into terraform

We do the templating at part of bin/test and our lib facility, and pass it to both ansible and terraform!


resource "aws_s3_bucket" "logs-bucket" {
# data is namespaced by host within the bucket
bucket = "file-to-s3-reliability-test-data"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, should be templated.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would template this so it's namespaced like our instance names. Ex:

vector-test-${var.user_id}-${var.test_name}-${var.test_configuration}-test-data

Comment thread cases/file_to_s3_reliability/terraform/variables.tf Outdated

module "vpc" {
source = "../../../terraform/aws_vpc"
source = "../aws_vpc"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I was doing something wrong, but nothing worked at all without changing these paths.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd. Test harness seems to work on master currently...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked, doesn't work on my end either. Created #45.

}

module "topology" {
source = "../../../terraform/aws_uni_topology"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to change this from the case I copied to get things working.

@MOZGIII MOZGIII left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea!

While reading, I thought - what if we use a global shared S3 bucket? But it's better this way. We definitely need to template the bucket name.
We probably should introduce a run_id, derived from the user_id + a random string. This would also be useful for proper terraform state isolation - something I wanted to work on as part of the task to run multiple test cases in parallel.

---
- name: Install Vector configuration
template:
copy:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! We can use different variable_end_string and variable_start_string when resolving vector templates.

You can run this test via:

```
test -t file_to_s3_reliability

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we kill the VMs via CloudWatch alert. We need a better way, maybe could use lambda for it.


module "vpc" {
source = "../../../terraform/aws_vpc"
source = "../aws_vpc"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd. Test harness seems to work on master currently...

@binarylogic binarylogic left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Really happy to see this implemented. It definitely raises our confidence that Vector will work reliably over long periods. I think we should clean up these final items.

Also, didn't see anything in here about Slack notifications, etc. I assume that is hard-coded into the verifiable logger? I'm wondering if we can generalize this communication strategy somehow? I don't want to overthink this, but I know we'll need "alerts" of some kind for other tests in the future. We could even throw them on a queue and handle them "generally" out of band. Just thinking out loud a little bit.


resource "aws_s3_bucket" "logs-bucket" {
# data is namespaced by host within the bucket
bucket = "file-to-s3-reliability-test-data"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would template this so it's namespaced like our instance names. Ex:

vector-test-${var.user_id}-${var.test_name}-${var.test_configuration}-test-data

Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants