Clojure, transform collection to map - clojure

I was wondering what was the best way to iterate over many collection to create a map in clojure. Actually i have 3 collection:
("Aujourd'hui" "Demain" "25.11" "26.11" "27.11" "28.11" "29.11")
("2 °C" "2 °C" "1 °C" "0 °C" "-3 °C" "-4 °C" "0 °C")
("8 °C" "6 °C" "4 °C" "2 °C" "1 °C" "1 °C" "5 °C")
And i like to create a collection of maps looking like this:
{:date Aujourd'hui :temp-min 2°C :temp-max 8°C}{...}
And know it should not be so difficult but I can't figure out how to do that right.
Thanks for your help !

We can use map to construct a hash-map for each index of the collections. When provided with more than two arguments, map moves through all the collections in parallel.
user> (let [dates '("Aujourd'hui" "Demain" "25.11" "26.11" "27.11" "28.11" "29.11")
mins '("2 °C" "2 °C" "1 °C" "0 °C" "-3 °C" "-4 °C" "0 °C")
maxes '("8 °C" "6 °C" "4 °C" "2 °C" "1 °C" "1 °C" "5 °C")]
(pprint (map #(hash-map :date %1 :temp-min %2 :temp-max %3) dates mins maxes)))
({:date "Aujourd'hui", :temp-max "8 °C", :temp-min "2 °C"}
{:date "Demain", :temp-max "6 °C", :temp-min "2 °C"}
{:date "25.11", :temp-max "4 °C", :temp-min "1 °C"}
{:date "26.11", :temp-max "2 °C", :temp-min "0 °C"}
{:date "27.11", :temp-max "1 °C", :temp-min "-3 °C"}
{:date "28.11", :temp-max "1 °C", :temp-min "-4 °C"}
{:date "29.11", :temp-max "5 °C", :temp-min "0 °C"})

The following function constructs a table as a sequence of records from column heading titles and sequence of columns:
(defn build-table [titles columns]
(apply map (fn [& xs] (zipmap titles xs)) columns))
There should be as many :titles as there are columns.
For example,
(build-table [:date :temp-min :temp-max] data)
where
(def data ['("Aujourd'hui" "Demain" "25.11" "26.11" "27.11" "28.11" "29.11")
'("2 °C" "2 °C" "1 °C" "0 °C" "-3 °C" "-4 °C" "0 °C")
'("8 °C" "6 °C" "4 °C" "2 °C" "1 °C" "1 °C" "5 °C")])
... produces
({:temp-max "8 °C", :temp-min "2 °C", :date "Aujourd'hui"}
{:temp-max "6 °C", :temp-min "2 °C", :date "Demain"}
{:temp-max "4 °C", :temp-min "1 °C", :date "25.11"}
{:temp-max "2 °C", :temp-min "0 °C", :date "26.11"}
{:temp-max "1 °C", :temp-min "-3 °C", :date "27.11"}
{:temp-max "1 °C", :temp-min "-4 °C", :date "28.11"}
{:temp-max "5 °C", :temp-min "0 °C", :date "29.11"})
This leaves all the data elements as strings. Converting them to numbers, preferably with units attached, can be tackled independently. As they are written, such as 2°C are not valid Clojure.

Related

PowerBI: copy previous dates values in case of missing dates

I have a rather large table in PowerBI that looks as follows:
Date1
ID1
ID2
Date2
Amount1
Amount2
Amount3
04.02.2022
1234
12
04.02.2022
5
3
8
04.02.2022
1234
13
04.02.2022
5
3
8
04.02.2022
1235
14
04.02.2022
6
3
9
06.02.2022
1234
10
06.02.2022
20
23
46
06.02.2022
1238
11
06.02.2022
20
23
46
06.02.2022
1238
14
06.02.2022
26
23
49
As in the case above, if e.g. 05.02.2022 is missing, I would like my end result to look like
Date1
ID1
ID2
Date2
Amount1
Amount2
Amount3
04.02.2022
1234
12
04.02.2022
5
3
8
04.02.2022
1234
13
04.02.2022
5
3
8
04.02.2022
1235
14
04.02.2022
6
3
9
05.02.2022
1234
12
05.02.2022
5
3
8
05.02.2022
1234
13
05.02.2022
5
3
8
05.02.2022
1235
14
05.02.2022
6
3
9
06.02.2022
1234
10
06.02.2022
20
23
46
06.02.2022
1238
11
06.02.2022
20
23
46
06.02.2022
1238
14
06.02.2022
26
23
49
Which means that everything from 04.02.2022 is copy pasted, just with a new date, 05.02.2022.
There are also cases where no data is available for 2 or 3 days, so in those instances I would need the all data from the last known date, until we have data again.
Does someone know how to implement this in PowerBI?
Thank you!
The following should work for you. I have named your sample data query as Table.
Create a new query and paste in the following code. This new query refers to your sample data query named Table so you will have two queries.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjDRMzDSMzIwMlKK1QFyTVG5ZghuLAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Date = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Date"}, Table, {"Date1"}, "Table", JoinKind.LeftOuter),
#"Added Custom" = Table.AddColumn(#"Merged Queries", "Count", each if Table.RowCount([Table]) > 0 then [Table] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Count"}),
#"Added Custom1" = Table.AddColumn(#"Filled Down", "Custom", each if Table.RowCount([Table]) > 0 then [Table] else Table.ReplaceValue([Count],[Count]{0}[Date1],[Date],Replacer.ReplaceValue,{"Date1", "Date2"})),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Table", "Count"}),
#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"Date1", "ID1", "ID2", "Date2", "Amount1", "Amount2", "Amount3"}, {"Date1", "ID1", "ID2", "Date2", "Amount1", "Amount2", "Amount3"}),
#"Removed Columns1" = Table.RemoveColumns(#"Expanded Custom",{"Date"})
in
#"Removed Columns1"
If you need to fill out more dates, then just change the date range in step 1 which you should be able to auto generate depending on your data. Mine looks like this.

printing a line when a certain word is found in a file (c++)

im trying to write a program that involves a text file. my goal is to get any line that starts with the string "From:" and put that line without the "From" into an array
this is what ive come up so far, but it seems to be doing the exact opposite of what i want and saves every line that DOESNT have a "From:" in the line
#include <iostream>
#include <fstream>
#include <vector>
#include <queue>
#include <list>
using namespace std;
int main()
{
ifstream file;
string array[10000];
string line;
string yep = "From:";
int i = 0;
file.open("myFile.txt"); //name of file im trying to open
while(!file.eof())
{
getline(file, line);
if (line.find((yep)))
{
array[i++]=line;
}
}
file.close();
cout<< array[0] << endl;
cout<< array[1] << endl;
cout<< array[2] << endl;
cout<< array[3] << endl;
cout<< array[4] << endl;
cout<< array[5] << endl;
cout<< array[6] << endl;
cout<< array[7] << endl;
cout<< array[9] << endl;
cout<< array[10] << endl;
cout<< array[11] << endl;
cout<< array[12] << endl;
cout<< array[13] << endl;
cout<< array[14] << endl;
cout<< array[15] << endl;
cout<< array[16] << endl;
cout<< array[17] << endl;
}
here is what is in "myFile.txt."
From: Seoul, South Korea
To : Toronto, Canada
Luxembourg, Luxembourg
Seattle, United States
Dhaka, Bangladesh
Guatemala City, Guatemala
From: Tokyo, Japan
To : Ottawa, Canada
Vilnius, Lithuania
Rome, Italy
From: Hong Kong, SAR
To : New York City, United States
New Delhi, India
Washington, United States
Dublin, Ireland
Lisbon, Portugal
Vienna, Austria
Santiago, Chile
Rio de Janeiro, Brazil
Berlin, Germany
From: London, United Kingdom
To : Accra, Ghana
From: Osaka, Japan
To : Guatemala City, Guatemala
Helsinki, Finland
Detroit, United States
Vienna, Austria
Shenzhen, People's Republic of China
Harare, Zimbabwe
Montreal, Canada
Jakarta, Indonesia
Birmingham, United Kingdom
From: Geneva, Switzerland
To : Ljubljana, Slovenia
Kiev, Ukraine
Pittsburgh, United States
Bratislava, Slovakia
Mumbai, India
Luxembourg, Luxembourg
Rome, Italy
Lisbon, Portugal
Abu Dhabi, United Arab Emirates
From: Copenhagen, Denmark
To : Beijing, People's Republic of China
Pittsburgh, United States
From: Zurich, Switzerland
To : Seattle, United States
Caracas, Venezuela
Manila, Philippines
From: Oslo, Norway
To : Dusseldorf, Germany
Shanghai, People's Republic of China
New Delhi, India
White Plains, United States
Pittsburgh, United States
Denver, United States
Rome, Italy
Quito, Ecuador
Buenos Aires, Argentina
From: New York City, United States
To : Abidjan, Cote d'Ivoire
Casablanca, Morocco
Jakarta, Indonesia
Copenhagen, Denmark
Kingston, Jamaica
From: St. Petersburg, Russia
To : Amman, Jordan
From: Milan, Italy
To : Houston, United States
Mexico City, Mexico
From: Beijing, People's Republic of China
To : Ljubljana, Slovenia
Kuala Lumpur, Malaysia
Dhaka, Bangladesh
Melbourne, Australia
Sydney, Australia
Casablanca, Morocco
Munich, Germany
Istanbul, Turkey
here is the output:
To : Toronto, Canada
Luxembourg, Luxembourg
Seattle, United States
Dhaka, Bangladesh
Guatemala City, Guatemala
To : Ottawa, Canada
Vilnius, Lithuania
Rome, Italy
New Delhi, India
Washington, United States
Dublin, Ireland
Lisbon, Portugal
Vienna, Austria
Santiago, Chile
Rio de Janeiro, Brazil
Berlin, Germany
To : Accra, Ghana
The issue is your if statement. The string::find() method returns the index of the first character in the substring being searched for. In this case, that's 0 (the start of the line), meaning the if evaluates to false. If the substring you're looking for is not present, find() returns std::string::npos, which is non-zero and thus in your if is evaluating to true.
What you should do is switch from if (line.find((yep))) to if (line.find(yep) == 0) if the "From" needs to be at the beginning of the line, or to if (line.find(yep) != string::npos) if it just needs to be contained anywhere in the line.

Replace value in variable based on other variables

Assuming I have the following dataset, which has a few missing entries for Country:
clear
input strL Person strL Country Population
'ABC' "USA" 3999
'ABC' " " 544
'ABC' " " 7546
'ABD' "China" 10000
'BCG' "India" 6789
'BCG' " " 5454
'ABD' " " 10000
end
I wish to replace missing countries with the matching values in Person. For all Person 'ABC', the country should be the same.
I need a solution that differs from manually scripting replace Country = "USA" if Person == "ABC" as my dataset has more than 10,000 unique observations for Person.
The dataset should look like the following:
Person Country Population
'ABC' "USA" 2514
'ABC' "USA" 388
'ABC' "USA" 8245
'ABD' "China" 10000
'BCG' "India" 6789
'BCG' "India" 5454
'ABD' "China" 10000
Your input and output don't match Stata standards. Stata does not use single quotes as string delimiters or show string delimiters in listings.
Stata doesn't regard one or more spaces as string missing.
Nevertheless this may help for a string variable such as Country:
clear
input strL Person strL Country Population
"ABC" "USA" 3999
"ABC" " " 544
"ABC" " " 7546
"ABD" "China" 10000
"BCG" "India" 6789
"BCG" " " 5454
"ABD" " " 10000
end
bysort Person (Country) : replace Country = Country[_N] if missing(trim(Country))
list, sepby(Person)
+-----------------------------+
| Person Country Popula~n |
|-----------------------------|
1. | ABC USA 7546 |
2. | ABC USA 544 |
3. | ABC USA 3999 |
|-----------------------------|
4. | ABD China 10000 |
5. | ABD China 10000 |
|-----------------------------|
6. | BCG India 5454 |
7. | BCG India 6789 |
+-----------------------------+

Renaming columns of dataframe with values from another dataframe

so I have a dataframe that roughly looks like this:
name1 name2 name3
123 456 678
123 456 678
123 456 678
and another dataframe that looks like this
name2 abc
name3 cdf
name1 fgh
Is there any way I can make the first dataframe column names like this:
fgh abc cdf
123 456 678
123 456 678
123 456 678
Thanks.
Use rename by Series with set_index for index by column A:
print (df2)
A B
0 name2 abc
1 name3 cdf
2 name1 fgh
df1 = df1.rename(columns=df2.set_index('A')['B'])
print (df1)
fgh abc cdf
0 123 456 678
1 123 456 678
2 123 456 678
Detail:
print (df2.set_index('A')['B'])
A
name2 abc
name3 cdf
name1 fgh
Name: B, dtype: object
Or by dictionary created by zip:
df1 = df1.rename(columns=dict(zip(df2.A, df2.B)))
Detail:
print (dict(zip(df2.A, df2.B)))
{'name3': 'cdf', 'name1': 'fgh', 'name2': 'abc'}
You can using Series get and assign it back
df.columns=s.get(df.columns)
df
Out[223]:
s1 fgh abc cdf
0 123 456 678
1 123 456 678
2 123 456 678

How to split character and numerical separately in R

I have a dataframe which looks like this:
df= data.frame(name= c("1Alex100.00","12Rina Faso92.31","113john00.00"))
And I want to split this into a data frame with 3 columns so that the output looks like:
name1 name2 name3
1 Alex 100.00
12 Rina Faso 92.31
113 john 00.00
I have tried stringr() and grep() and have got limited success. Lack of a delimiter makes it lot more difficult.
You could try
library(tidyr)
res <- extract(df, name, into=c('name1', 'name2', 'name3'),
'(\\d+)([^0-9]+)([0-9.]+)', convert=TRUE)
res
# name1 name2 name3
#1 1 Alex 100.00
#2 2 Rina Faso 92.31
#3 3 john 50.00
str(res)
# 'data.frame': 3 obs. of 3 variables:
#$ name1: int 1 2 3
#$ name2: Factor w/ 3 levels "Alex","john",..: 1 3 2
# $ name3: num 100 92.3 50
Update
Based on 'df' from #DavidArenburg's post
res <- extract(df, name, into=c('name1', 'name2', 'name3'),
'(\\d+)([^0-9]+)([0-9.]+)', convert=TRUE)
res
# name1 name2 name3
#1 121 Réunion 13.76
#2 2 Côte d'Ivoire 22.40
#3 3 john 50.00
Try with str_match from stringr:
str_match(df$name, "^([0-9]*)([A-Za-z ]*)([0-9\\.]*)")
# [,1] [,2] [,3] [,4]
# [1,] "1Alex100.00" "1" "Alex" "100.00"
# [2,] "2Rina Faso92.31" "2" "Rina Faso" "92.31"
# [3,] "3john50.00" "3" "john" "50.00"
So as.data.frame(str_match(df$name, "^([0-9]*)([A-Za-z ]*)([0-9\\.]*)")[,-1]) should give you the desired result.
You could do like this also.
> df <- data.frame(name= c("1Alex100.00","12Rina Faso92.31","113john00.00"))
> x <- do.call(rbind.data.frame, strsplit(as.character(df$name), "(?<=[A-Za-z])(?=\\d)|(?<=\\d)(?=[A-Za-z])", perl=T))
> colnames(x) <- c("name1", "name2", "name3")
> print(x, row.names=FALSE)
name1 name2 name3
1 Alex 100.00
12 Rina Faso 92.31
113 john 00.00
With base R it could be done abit uglier though it works with special characters too
with(df, cbind(sub("\\D.*", "", name),
gsub("[0-9.]", "", name),
gsub(".*[A-Za-z]", "", name)))
# [,1] [,2] [,3]
# [1,] "1" "Alex" "100.00"
# [2,] "2" "Rina Faso" "92.31"
# [3,] "3" "john" "50.00"
An example on special characters
df = data.frame(name= c("121Réunion13.76","2Côte d'Ivoire22.40","3john50.00"))
with(df, cbind(sub("\\D.*", "", name),
gsub("[0-9.]", "", name),
gsub(".*[A-Za-z]", "", name)))
# [,1] [,2] [,3]
# [1,] "121" "Réunion" "13.76"
# [2,] "2" "Côte d'Ivoire" "22.40"
# [3,] "3" "john" "50.00"
Base R not ugly solutions:
proto=data.frame(name1=numeric(),name2=character(),name3=numeric())
strcapture("(\\d+)(\\D+)(.*)",as.character(df$name),proto)
name1 name2 name3
1 1 Alex 100.00
2 12 Rina Faso 92.31
3 113 john 0.00
read.table(text=gsub("(\\d+)(\\D+)(.*)","\\1|\\2|\\3",df$name),sep="|")
V1 V2 V3
1 1 Alex 100.00
2 12 Rina Faso 92.31
3 113 john 0.00
You could use the package unglue :
df <- data.frame(name= c("1Alex100.00","12Rina Faso92.31","113john00.00"))
library(unglue)
unglue_unnest(df, name, "{name1}{name2=\\D+}{name3}", convert = TRUE)
#> name1 name2 name3
#> 1 1 Alex 100.00
#> 2 12 Rina Faso 92.31
#> 3 113 john 0.00