Looking to see if a particular character exists in a string SAS - sas

I am working with customer data, part of which looks at customers email addresses. Unfortunately there are next to none controls on fields where customer data is input in the system and therefore requires scrubbing.
Using the current email field, I want to create a new field populated with the customer's email address based on the condition "if # exists" and then if it doesn't exist, I will populate the email address with a blank value.
For example:
Current Email Address New Email Address
customer1#business1.com customer1#business1.com
customer2#business2.com customer2#business2.com
customer3business3.com
Can anyone help - I have scoured the internet and cannot find anything that would do this!!
Thanks

You'd probably want more controls than this to validate an email address, but here you go:
data have;
infile cards;
input cur_email:$50.;
cards4;
customer1#business1.com
customer2#business2.com
customer3business3.com
;;;;
run;
data want;
set have;
if index(cur_email,"#") then new_email=cur_email;
run;

If you want to search for a string within the email address like 'gmail' then you can use this:
if COMPRESS(TRANWRD(cur_email,'gmail','~'),'~','k')='~' then new_email=cur_email;
or to be in keeping with the first answer:
if INDEX(TRANWRD(cur_email,'gmail','~'),'~') then new_email=cur_email;

Related

Kettle database lookup case insensitive

I've a table "City" with more than 100k records.
The field "name" contains strings like "Roma", "La Valletta".
I receive a file with the city name, all in upper case as in "ROMA".
I need to get the id of the record that contains "Roma" when I search for "ROMA".
In SQL, I must do something like:
select id from city where upper(name) = upper(%name%)
How can I do this in kettle?
Note: if the city is not found, I use an Insert/update field to create it, so I must avoid duplicates generated by case-sensitive names.
You can make use of the String Operations steps in Pentaho Kettle. Set the Lower/Upper option to Y
Pass the city (name) from the City table to the String operations steps which will do the Upper case of your data stream i.e. city name. Join/lookup with the received file and get the required id.
More on String Operations step in pentaho wiki.
You can use a 'Database join' step. Here you can write the sql:
select id from city where upper(name) = upper(?)
and specify the city field name from the text file as parameter. With 'Number of rows to return' and 'Outer join?' you can control the join behaviour.
This solution doesn't work well with a large number of rows, as it will execute one query per row. In those cases Rishu's solution is better.
This is how I did:
First "Modified JavaScript value" step for create a query:
var queryDest="select coalesce( (select id as idcity from city where upper(name) = upper('"+replace(mycity,"'","\'\'")+"') and upper(cap) = upper('"+mycap+"') ), 0) as idcitydest";
Then I use this string as a query in a Dynamic SQL row.
After that,
IF idcitydest == 0 then
insert new city;
else
use the found record
This system make a query for file's row but it use few memory cache

SAS set statement using colon and creating a filename variable

So using SAS, I have a number of SAS monthend datasets named as follows:
mydata_201501
mydata_201602
mydata_201603
mydata_201604
mydata_201605
...
mydata_201612
Each has account information at particular monthend. I want to stack the datasets all into one dataset using colon rather than writing out the full set statement as follows:
data mynewdata;
set mydata_:;
run;
However there is no datestamp variable within the datasets so when I stack them I will lose the monthend information for each account. I want to know which line refers to which monthend for each account. Is there a way I can automatically create a variable that names the table the row come from. for example the long winded way would be this:
data mynewdata;
set mydata_201501 (in=a) mydata_201502 (in=b) mydata_201503 (in=c)...;
if a then tablename = 'mydata_201501';
if b then tablename = 'mydata_201502';
if c...
run;
but is there a quicker way using colon along these lines?
data mynewdata;
set mydata_:;
tablename = _tablelabel_;
run;
thanks
I always find clicking on comment links annoying, so hopefully here's the answer in your context. Use the INDSNAME= SET statement option to assign the dataset name to a variable:
data mynewdata;
set mydata_: indsname=_tablelabel_;
tablename = _tablelabel_;
run;
N.B. you can call _tablelabel_ whatever you want, and you may wish to change it so it doesn't look like a SAS generated variable name.
INDSNAME= only became a SAS SET statement option in version 9.2
Just to be clear, with my particular code, where the datasets were named mydata_yyyymm and I wanted a monthend variable with datestamp, I was able to produce this using the solution provided by mjsqu as follows (obs and keep statement provided if required):
data mynewdata;
set mydata_: (obs=100 keep=xxx xxx) indsname=_tablelabel_;
format monthend yymmdd10.;
monthend = input(scan(_tablelabel_,-1,'_'),yymmn6.);
run;

How to fetch last inserted record for particular id?

Apologies, I am completely new to Django. My question is that I have 20 records in my database table and suppose 10 record is of same ID and I want to fetch last inserted record for that id I have date column in my table. How can I do that?
last_obj = YourModel.objects.last()
But generally, you can't create > 1 objects with same id, if you didn't specified your own id field to replace built-in. And even then, it's a bad idea.

Regex QueryString Parsing for a specific in BigQuery

So last week I was able to begin to stream my Appengine logs into BigQuery and am now attempting to pull some data out of the log entries into a table.
The data in protoPayload.resource is the page requested with the querystring paramters included.
The contents of protoPayload.resource looks like the following examples:
/service.html?device_ID=123456
/service.html?v=2&device_ID=78ec9b4a56
I am getting close, but when there is another entry before device_ID, I am not getting it. As you can see I am not great with Regex, but it is the only way I think I can parse the data in the query. To get just the device ID from the first example, I was able to use the following example. Works great. My next challenge is to the data when the second parameter exists. The device IDs can vary in length from about 10 to 26 characters.
SELECT
RIGHT(Regexp_extract(protoPayload.resource,r'[\?&]([^&]+)'),
length(Regexp_extract(protoPayload.resource,r'[\?&]([^&]+)'))-10) as Device_ID
FROM logs
What I would like is just the values from the querystring device_ID such as:
123456
78ec9b4a56
Assuming you have just 1 query string per record then you can do this:
SELECT REGEXP_EXTRACT(protoPayload.resource, r'device_ID=(.*)$') as device_id FROM mytable
The part within the parentheses will be captured and returned in the result.
If device_ID isn't guaranteed to be the last parameter in the string, then use something like this:
SELECT REGEXP_EXTRACT(protoPayload.resource, r'device_ID=([^\&]*)') as device_id FROM mytable
One approach is to split protoPayload.resource into multiple service entries, and then apply regexp - this way it will support arbitrary number of device_id, i.e.
select regexp_extract(service_entry, r'device_ID=(.*$)') from
(select split(protoPayload.resource, ' ') service_entry from
(select
'/service.html?device_ID=123456 /service.html?v=2&device_ID=78ec9b4a56'
as protoPayload.resource))

Retrieve Column Based On Data Step Variable

I'm writing a SAS job. For this SAS job, I need to do the following --
Retrieve the value of the field ActiveColumn. This value will be the name of another column in the table.
Set ActiveValue equal to the value of the field named by ActiveColumn.
Basically, I'm trying to write a version of this where I don't half to write out every column name beforehand --
Select(ActiveColumn);
when ('CITY') ActiveValue = City;
when ('STATE') ActiveValue = State;
when ('ZIP') ActiveValue = Zip;
otherwise;
What is the simplest way to do this?
Thank you very much!
This sounds like a vertical transpose. That would be done something like this, if all fields are character:
data want;
set have;
array fields city state zip;
do _t = 1 to dim(fields);
if lowcase(activeColumn)=vname(fields[_t]) then activeValue=fields[_t];
*may want an OUTPUT here.;
end;
run;
If they are mixed type you would need two arrays and loops. You might not need ActiveColumn if you are intending to just loop over all fields anyway; you can just set ActiveColumn to vname(fields[_t]) in the loop.
If you are intending to have this be more flexible, you can use array fields _character_;
which will use all character variables (thus meaning you don't have to explicitly specify them).